Managed Transform protos & translation; Iceberg SchemaTransforms & translation#30910
Merged
ahmedabu98 merged 66 commits intoapache:masterfrom Apr 22, 2024
Merged
Conversation
Contributor
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
chamikaramj
reviewed
Apr 11, 2024
...io/iceberg/src/main/java/org/apache/beam/io/iceberg/IcebergWriteSchemaTransformProvider.java
Outdated
Show resolved
Hide resolved
...io/iceberg/src/main/java/org/apache/beam/io/iceberg/IcebergWriteSchemaTransformProvider.java
Show resolved
Hide resolved
...io/iceberg/src/main/java/org/apache/beam/io/iceberg/IcebergWriteSchemaTransformProvider.java
Outdated
Show resolved
Hide resolved
…t schema; add iceberg to IO expansion service
…erg_write_schematransform
…erg_write_schematransform
…erg_translation Pulling Read connector and making a translation for that too.
… into iceberg_write_schematransform
… Managed and Iceberg urns from proto and use SCHEMA_TRANSFORM URN
chamikaramj
reviewed
Apr 20, 2024
.../iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/IcebergSchemaTransformTranslation.java
Outdated
Show resolved
Hide resolved
…se conversion step from Python auto-xlang; spotless
chamikaramj
reviewed
Apr 22, 2024
Contributor
There was a problem hiding this comment.
LGTM. Thanks.
I think this includes what we want in the release. But this won't work end-to-end for upgrading till we update the ExpansionService logic as I mentioned in a comment.
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/FileWriteResult.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/IcebergIO.java
Show resolved
Hide resolved
...ceberg/src/main/java/org/apache/beam/sdk/io/iceberg/IcebergSchemaTransformCatalogConfig.java
Show resolved
Hide resolved
.../iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/IcebergSchemaTransformTranslation.java
Outdated
Show resolved
Hide resolved
.../iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/IcebergSchemaTransformTranslation.java
Outdated
Show resolved
Hide resolved
.../iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/IcebergSchemaTransformTranslation.java
Outdated
Show resolved
Hide resolved
…A_TRANSFORM urn, fetch underlying identifier
…ersions from python side
robertwb
approved these changes
Apr 22, 2024
Contributor
robertwb
left a comment
There was a problem hiding this comment.
Thanks for your patience with all my comments, both on the CL and out of band.
There are a huge number of separable changes going on in this PR, but given the time constraint it probably isn't worth separating them into separate PRs/commits at this point, so we can get this in as is.
Contributor
Author
|
Thank you all for the valuable feedback. Merging this now |
16 tasks
3 tasks
damccorm
pushed a commit
that referenced
this pull request
Apr 23, 2024
* iceberg write schematransform and test * cleanup * IcebergIO translation and tests * add sanity check for building with Row; add documentation about output schema; add iceberg to IO expansion service * spotless * spotless * permitUnusedDeclared iceberg * Change ManagedSchemaTransformProvider to take a Row config instead of a Yaml string * don't auto generate external wrapper for this just yet * spotless * spotless * Read schematransform and tests * pulling in IcebergIO changes; spotless * icebergio translation; managed translation; protos * spotless * spotless; use underscore instead of camel case field names when translating managed transform config * add grpc dependency * updated proto description; fix gen xlang command * ManagedTransform explicit input/output types; move iceberg package to org.apache.beam.sdk.io.iceberg * externalizable IcebergCatalogConfig * externalizable IcebergCatalogConfig supports all properties; address some comments * unify iceberg urns and identifiers; update some comments * one source for all supported managed transform identifiers * add documentation * custom serialization for OneTableDynamicDestinations * add iceberg via managed API tests; update proto doc * rename config; change test schematransform location * spotless * add missing package-info file * spotless * replace icebergIO translation with iceberg schematransform translation; fix Schema::sorted to do recursive sorting * remove ExternalizableIcebergCatalogConfig (no longer needed) * pull identifiers from generated proto * remove unused hadoop dependency * update generate sequence wrapper after Schema sorting * managed transform translation uses default schema * yaml returns null row; cleanup * spotless * remove SchemaAwareTransformPayload and use SchemaTransformPayload instead; rename StandardSchemaAwareTransforms -> ManagedSchemaAwareTransforms * create a beam-schema-compatible class for Snapshot info * removed new proto file and moved Managed URNs to beam_runner_api.proto; we now use SchemaTransformPayload for all schematransforms, including Managed; adding a version number to FileWriteResult encoding so that we can use it to fork in the future whhen needed * Row and Schema snake_case <-> camelCase conversion logic * Row sorted() util * use Row::sorted to fetch Managed & Iceberg row configs * use snake_case convention when translating transforms to spec; remove Managed and Iceberg urns from proto and use SCHEMA_TRANSFORM URN * spotless * cleanup * DefaultSchemaProvider can now provide the underlying SchemaProvider * perform snake_case <-> camelCase conversions directly in TypedSchemaTransformProvider * update icebergIO and managed translations to reflect field name convention changes * sorted SnapshotInfo * update manual Python wrappers to use snake_case convention; remove case conversion step from Python auto-xlang; spotless * Row utils allow nullable * add FileWriteResult test for version number; fix existing Java and YAML tests * add schema-aware transform urn to transform annotations during translation * add comments why we sort and snake_case configuration schemas * add SchemaTransformTranslation abstraction. when encountering a SCHEMA_TRANSFORM urn, fetch underlying identifier * add documentation * prioritize registered providers; remove snake_case <-> camelCase conversions from python side * cleanup
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
sorted(),toSnakeCase(),toCamelCase())snake_caseas the convention for SchemaTransform configuration field names (Fixes [Task]: TypedSchemaTransformProvider should generate Schema field names withlower_snake_caseconvention #31061)