Skip to content

Create PR by mistake#37796

Closed
jason810496 wants to merge 1 commit intoapache:masterfrom
iting0321:unboundedsource
Closed

Create PR by mistake#37796
jason810496 wants to merge 1 commit intoapache:masterfrom
iting0321:unboundedsource

Conversation

@jason810496
Copy link
Member

No description provided.

@jason810496 jason810496 marked this pull request as draft March 8, 2026 06:06
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances Apache Beam's capabilities for defining custom streaming data sources by introducing a robust UnboundedSource API in Python, mirroring existing Java functionality. It provides the foundational abstract classes, a Splittable DoFn wrapper for seamless pipeline integration, and practical examples in both Java and Python. Additionally, a new PeriodicImpulseSource is included to demonstrate a real-world application of the new API, ensuring comprehensive support for checkpointing, watermarking, and efficient processing of unbounded data streams.

Highlights

  • Java UnboundedSource Demo: A new Java example, UnboundedSourceDemo.java, has been added to demonstrate how to implement a custom UnboundedSource with support for checkpointing, watermarks, and record IDs. This example generates a finite sequence of integers and includes various pipeline tests.
  • Python UnboundedSource API Implementation: The core UnboundedSource, UnboundedReader, and CheckpointMark abstract base classes have been introduced in sdks/python/apache_beam/io/iobase.py. This includes the necessary Splittable DoFn (SDF) wrapper infrastructure to enable these custom sources to run within Beam pipelines, handling restriction tracking, watermarking, and checkpointing.
  • Python UnboundedSource Demo: A new Python example, unbounded_source_demo.py, has been added. It is the Python counterpart to the Java demo, showcasing a custom UnboundedSource that generates integers with timestamps, watermarks, and checkpointing, along with correctness verification.
  • Python PeriodicImpulseSource: A new native Python streaming IO, PeriodicImpulseSource, has been implemented. This UnboundedSource generates time-based impulses at regular intervals, serving as a useful streaming trigger or heartbeat source, and demonstrates practical usage of the new API.
  • Comprehensive Testing: Extensive unit and end-to-end tests have been added in sdks/python/apache_beam/io/unbounded_source_test.py to validate the functionality of the new Python UnboundedSource API, its SDF wrapper, and the PeriodicImpulseSource.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • examples/java/src/main/java/org/apache/beam/examples/UnboundedSourceDemo.java
    • Added a new Java example demonstrating a custom UnboundedSource with checkpointing and watermarking.
  • examples/python/unbounded_source_demo.py
    • Added a new Python example for a custom UnboundedSource, mirroring the Java demo.
  • sdks/python/apache_beam/io/init.py
    • Updated imports to expose new UnboundedSource, UnboundedReader, and CheckpointMark classes.
  • sdks/python/apache_beam/io/iobase.py
    • Introduced abstract base classes for CheckpointMark, UnboundedSource, and UnboundedReader.
    • Implemented the Splittable DoFn (SDF) wrapper for UnboundedSource to enable its execution within Beam pipelines, including restriction tracking and watermark estimation.
  • sdks/python/apache_beam/io/periodic_impulse_source.py
    • Added a new UnboundedSource implementation, PeriodicImpulseSource, which generates time-based impulses.
  • sdks/python/apache_beam/io/unbounded_source_test.py
    • Added comprehensive unit and end-to-end tests for the new UnboundedSource API and its SDF wrapper, as well as for PeriodicImpulseSource.
Activity
  • No activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@jason810496 jason810496 closed this Mar 8, 2026
@jason810496 jason810496 changed the title [Code Review] Support UnboundSource Create PR by mistake Mar 8, 2026
@codecov
Copy link

codecov bot commented Mar 8, 2026

Codecov Report

❌ Patch coverage is 80.91168% with 67 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.91%. Comparing base (26b18c6) to head (6c62a4b).
⚠️ Report is 68 commits behind head on master.

Files with missing lines Patch % Lines
sdks/python/apache_beam/io/iobase.py 79.29% 59 Missing ⚠️
...s/python/apache_beam/io/periodic_impulse_source.py 87.30% 8 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #37796       +/-   ##
=============================================
+ Coverage     40.08%   56.91%   +16.82%     
  Complexity     3416     3416               
=============================================
  Files          1177     1179        +2     
  Lines        187315   187877      +562     
  Branches       3588     3588               
=============================================
+ Hits          75090   106930    +31840     
+ Misses       108833    77555    -31278     
  Partials       3392     3392               
Flag Coverage Δ
python 80.07% <80.91%> (+40.33%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant