[BEAM-12164]: Add SDF for reading change stream records#16514
Merged
pabloem merged 1 commit intoapache:masterfrom Jan 21, 2022
thiagotnunes:change-stream-read-partition
Merged
[BEAM-12164]: Add SDF for reading change stream records#16514pabloem merged 1 commit intoapache:masterfrom thiagotnunes:change-stream-read-partition
pabloem merged 1 commit intoapache:masterfrom
thiagotnunes:change-stream-read-partition
Conversation
Adds ReadChangeStreamPartitionDoFn, which is an SDF to read partitions from change streams and process them accordingly. This component receives a change stream name, a partition, a start time and an end time to query. It then initiates a change stream query with the received parameters. Within a change stream, 3 types of records can be received: 1. A Data record 2. A Heartbeat record 3. A Child partitions record Upon receiving #1, the function updates the watermark with the record's commit timestamp and emits the record into the output PCollection. Upon receiving #2, the function updates the watermark with the record's timestamp, but it does not emit any record into the PCollection. Upon receiving #3, the function updates the watermark with the record's timestamp and writes the new child partitions into the metadata table. These partitions will be later scheduled by the DetectNewPartitions component. Once the change stream query for the element partition finishes, it marks the partition as finished in the metadata table and terminates.
Contributor
Author
|
retest this please |
Contributor
Author
|
R: @pabloem |
Member
|
ah this PR is surprisingly easy to follow. I think it makes sense to me. LGTM |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds ReadChangeStreamPartitionDoFn, which is an SDF to read partitions from change streams and process them accordingly. This component receives a change stream name, a partition, a start time and an end time to query. It then initiates a change stream query with the received parameters.
Within a change stream, 3 types of records can be received:
Upon receiving (1), the function updates the watermark with the record's commit timestamp and emits the record into the output PCollection.
Upon receiving (2), the function updates the watermark with the record's timestamp, but it does not emit any record into the PCollection.
Upon receiving (3), the function updates the watermark with the record's timestamp and writes the new child partitions into the metadata table. These partitions will be later scheduled by the DetectNewPartitions component.
Once the change stream query for the element partition finishes, it marks the partition as finished in the metadata table and terminates.