Merged
Conversation
more checkstyle improvements
…them explicitly
… 4.12, Spark 1.2; Add source, javadoc plugins and other info; Fix javadoc errors and a few typos
… outside contributions.
The primary change needed to accomodate the new dataflow api is to how we handle side inputs.
FileOutputFormats to be used with Spark Dataflow, as long as they implement the ShardNameTemplateAware interface. This is easily achieved by subclassing the desired FileOutputFormat class, see TemplatedSequenceFileOutputFormat for an example.
Support was added in Spark 1.5.0 for user exception propagation, see https://issues.apache.org/jira/browse/SPARK-8625. Fixes https://github.com/cloudera/spark-dataflow/issues/69
Add support for application name and streaming (default: false) Add pipeline options for streaming Add print output as an unbounded write Add default window strategy to represent Spark streaming micro-batches as fixed windows This translator helps to translate Dataflow transformations into Spark (+streaming) transformations. This will help to support streaming transformations separately Expose through the SparkPipelineTranslator Now Evaluator uses SparkPipelineTranslator to translate Add default application name StreamingEvaluation context to support DStream evaluation. Expose members and methods in EvaluationContext for inheritors Use configured app name in options A TransformTranslator for streaming Add support for spark streaming execution in the runner Fix comment Create input stream from a queue - mainly for testing I guess Add support to create input stream from queued values Override method to expose in package Test WordCount in streaming, just print out for now.. Stream print to console is a transformations of PCollection to PDone rename to CreateStream to differ from Dataflow Create It seems that in 1.3.1 short living streaming jobs fail (like unit tests). Maybe has something to do with SPARK-7930. fixed in 1.4.0 so bumped up. Expose some methods, add a method to check if RDDHolder exists make context final Streaming default should be local[1] to suppport unit tests No need for recurring context. Exposing additional parent methods. Added RUNNING state when stream is running. WordCount test runs 1 (sec) interval and compares to expected like in batch. Void Create triggers a no-input transformation transformations and output operations can be applied on streams/bounded collections in the pipeline foreachRDD is used for PDone transformation Commments SocketIO to consume stream from socket Comment Add support for Kafka input Comments and some patching-up Default is the same as in SparkPipelineOptions Adding licenses To satisfy license Javadoc and codestyle Satisfy license Javadoc and codestyle Check for DataflowAssertFailure because it won't propagate Since DataflowAssert doesn't propagate failures in streaming, use Aggregators to assert Use DataflowAssertStreaming Add kafka translation Embedded Kafka for unit test Kafka unit test import order license WindowingHelpers by Tom White @tomwhite Combine @tomwhite windowing branch into mine - values are windowed values now values are windowed values now Input is UNBOUNDED now Using windowing instead batchInterval to be determined by pipeline runner print the value not the windowed value remove support for for optimizations. for now. batchInterval is determined by the pipeline runner now Add streaming window pipeline visitor to determine windowing Add windowing support in streaming unit tests Combine.Globally is necessary so leave it fix line length renames Add implementation for GroupAlsoByWindow which helps to solve broken grouped/combinePerKey Line indentation unused codestyle Expose runtimeContext Make public Use the smallest window found (fixed/sliding) as the batch duration Make FieldGetter public Add support for windowing codestyle unused Update Spark to 1.5, kafka dependency should be provided Abstract Evaluator for common evaluator code. doVisitTransform per implementation. Added non-streaming windowing test by Tom White @tomwhite Fixed Combine.GroupedValues and Combine.Globally to work with WindowedValues without losing window properties. For now, Combine.PerKey is commented out until fixed to fully support WindowedValues. Support WindowedValues, Global or not, in Combine.PerKey After changes made to Combine.PerKey in 3a46150 it seems that the order has changed. Since ordere didn't seem relevant before the change, I don't see a reason not to change the expected value accordingly. Update Spark version to 1.5.2
Wrong packcage utils
…further untangle some generics issues. Update plugins. Fix some minor code issues from inspection.
Member
|
R: @davorbonaci I'll take a quick peek. |
Member
|
@davorbonaci feel free to merge this. I'll take care of integrating per https://issues.apache.org/jira/browse/BEAM-11 |
Member
|
Nice! I'd probably get rid of LICENCE and CONTRIBUTING.md right away, and prefix the pull request with I can merge this right away -- no issues there. Just to confirm -- both of you should have commit/write access to the project. Is that not the case? |
Member
|
Supposedly - but you're right, it'll be a good idea to test that.. Let me do the honors ;) |
Member
Contributor
|
Glad the snippet could help you out @tomwhite. Nice to see this going in! |
echauchot
added a commit
to echauchot/beam
that referenced
this pull request
May 12, 2017
lukecwik
referenced
this pull request
in lukecwik/incubator-beam
Mar 27, 2018
Use Read -> Impulse override utilities
robertwb
pushed a commit
to robertwb/incubator-beam
that referenced
this pull request
Apr 30, 2020
* Make check-links script more reliable * Fix typos in links
hengfengli
referenced
this pull request
in hengfengli/beam
Mar 21, 2022
* feat: make finish partition action idempotent * feat: make child partition action idempotent * fix: fix insert query in partition metadata dao * chore: spotless apply * refactor: catch error on child partition action Rely on already exists error to skip inserting a partition in the child partition record instead of checking if the key exists. We reduce the number of calls by doing this. * refactor: catch error on finish partition action Rely on catching an exception with a specific code to make the finish partition action idempotent. * refactor: removes unused dao method
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This addresses https://issues.apache.org/jira/browse/BEAM-6.
I've preserved git history (using @mxm's amazing git rewriting trick from #12). This is just an initial import - the Spark runner build is not yet integrated with the main build, packages need changing, etc. That's going to take more work, so it might be a good idea to get this merged first.