Migrate Avro reader to arrow-avro and remove internal conversion code#17861
Migrate Avro reader to arrow-avro and remove internal conversion code#17861alamb merged 81 commits intoapache:mainfrom
Conversation
|
❤️ amazing! Thank you @getChan |
|
Hi @getChan -- I am preparing to make an arrow release -- have you hit any blockers while integrating the new arrow-avro crate into DataFusion? |
No, not yet. Thanks for release. |
|
Thanks for jumping on this @getChan; let me know if I can help! |
# Conflicts: # Cargo.lock # Cargo.toml # datafusion/common/Cargo.toml
|
FYI I merged the arrow 57 upgrade to DataFusion -- so if you rebase this PR against main you'll have access to the new arrow-avro crate |
# Conflicts: # Cargo.lock
Co-authored-by: Connor Sanders <170039284+jecsand838@users.noreply.github.com>
# Conflicts: # Cargo.lock # datafusion/common/Cargo.toml
alamb
left a comment
There was a problem hiding this comment.
Thanks @getChan and @jecsand838 -- this is pretty epic work. The fact that all the tests pass is pretty incredible
I had a few small comments on the upgrade guide (I can help make them too if you want). Otherwise I think this is ready to go.
I also took the liberty of merging up from main to resolve a conflict
# Conflicts: # Cargo.toml
|
Any other thoughts @jecsand838? FYI @Igosuki -- this is a long time since you contributed the original avro reader. It is fun to see how far things have come |
|
Merged up to resolve a conflict |
|
go go go go! |
|
Epic work @getChan |
Which issue does this PR close?
arrow-avrofor performance and improved type support #14097Rationale for this change
DataFusion previously maintained custom Avro-to-Arrow conversion logic.
This PR migrates Avro reading to
arrow-avroto align behavior with upstream Arrow and remove duplicated implementation.What changes are included in this PR?
arrow-avro(ReaderBuilder)arrow-avroand removed priorapache-avrodependency usage in affected pathsarrow-avroprojection supportAre these changes tested?
Yes.
datafusion/datasource-avro(including projection and timestamp logical types)datafusion/sqllogictest/test_files/avro.sltAre there any user-facing changes?
Yes.
DataFusionError::AvroErroris removed.From<apache_avro::Error> for DataFusionErroris removed.datafusion::apache_avrotodatafusion::arrow_avro.datafusioncrateavrofeature no longer enablesdatafusion-common/avrodatafusion-protocrateavrofeature no longer enablesdatafusion-common/avroarrow-avrosemantics, including:stringvalues being read as ArrowBinaryin this pathtimestamp-*logical types read as UTC timezone-aware timestamps (Timestamp(..., Some("+00:00")))local-timestamp-*remaining timezone-naive (Timestamp(..., None))Upgrade notes are documented in:
docs/source/library-user-guide/upgrading/53.0.0.md