-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Redesign the try_reverse_output to support more cases #19446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR redesigns the try_reverse_output method to use equivalence properties and the ordering_satisfy API, enabling more sophisticated sort pushdown optimizations. The key improvement is the ability to recognize when reversed file scanning can satisfy requested orderings through constant column elimination, monotonic function reasoning, and prefix matching.
Key Changes:
- Modified
FileSource::try_reverse_outputto acceptEquivalencePropertiesparameter for advanced ordering analysis - Replaced simple
is_reversecheck withordering_satisfyinParquetSourceto handle complex cases like constant columns from filters - Added forward-direction check in
FileScanConfigto avoid unnecessary reverse optimization when ordering is already satisfied - Implemented
try_pushdown_sortinFilterExecandProjectionExecto propagate sort pushdown through the plan
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
datafusion/datasource/src/file.rs |
Updated FileSource trait to add eq_properties parameter to try_reverse_output, with comprehensive documentation of the new approach |
datafusion/datasource-parquet/src/source.rs |
Reimplemented try_reverse_output to use ordering_satisfy with equivalence properties instead of simple is_reverse check, enabling constant column and monotonic function handling |
datafusion/datasource/src/file_scan_config.rs |
Added forward-direction ordering check before attempting reverse optimization and enhanced comments explaining the delegation to FileSource |
datafusion/physical-plan/src/filter.rs |
Implemented try_pushdown_sort to propagate sort pushdown requests through filter operators |
datafusion/physical-plan/src/projection.rs |
Implemented try_pushdown_sort to propagate sort pushdown requests through projection operators |
datafusion/sqllogictest/test_files/dynamic_filter_pushdown_config.slt |
Added comprehensive test suite (Test 7.1-7.10) covering constant column filtering, literal constants in sort expressions, and edge cases |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
datafusion/sqllogictest/test_files/dynamic_filter_pushdown_config.slt
Outdated
Show resolved
Hide resolved
adriangb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very cool! I suggest splitting out the changes to FilterExec for further consideration and moving the tests to a different SLT file (order.slt? start an order_pushdown.slt?)
| Ok(FilterPushdownPropagation::if_all(child_pushdown_result)) | ||
| } | ||
|
|
||
| fn try_pushdown_sort( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good improvement but is orthogonal to the change to use EqProperties (just noting)
| }) | ||
| } | ||
|
|
||
| fn try_pushdown_sort( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A big picture question on this one: is it beneficial to push down sort order past a filter? If it's "free" of course. But if there's runtime cost to re-ordering rows, it may be cheaper to filter then re-order (if the filter is very selective).
I think in practice, for Parquet, everything ends up happening in the scan itself and filters always get applied first, so this is inconsequential. But in theory it could hurt data sources that e.g. don't support filter pushdown and hence rely on a FilterExec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adriangb for review, i agree, but currently i think it's good, i only add the following 3 basic cases, and don't add specific sort pushdown, just bypass to child sort pushdown to make it work:
FilterExec::try_pushdown_sort
ProjectionExec::try_pushdown_sort
CooperativeExec::try_pushdown_sort And they all just bypass self to the child sort pushdown, i was testing and also the new slt testing here, it seems need this, and they are the basic cases i think.
I also created a issue for the following operators to add more try_pushdown_sort implementation, we can investigate more for it:
| SET datafusion.optimizer.enable_sort_pushdown = true; | ||
|
|
||
|
|
||
| # Test 7: Sort pushdown with constant column filtering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why in dynamic_filter_pushdown_config.slt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point @adriangb , I added sort pushdown cases here because it included topk pushdown testing here, i can add a new test file.
Thank you @adriangb for review, i addressed the comments to add a new sort_pushdown.slt. But i still keep the bypass sort pushdown for : FilterExec::try_pushdown_sort
ProjectionExec::try_pushdown_sort
CooperativeExec::try_pushdown_sort And they all just bypass self to the child sort pushdown, i was testing and also the new slt testing here, it seems need this, and they are the basic cases i think. I also created a issue for the following operators to add more try_pushdown_sort implementation, we can investigate more for it: What do you think? |
Which issue does this PR close?
Rationale for this change
After #19064 was completed, I found it couldn't meet our internal project requirements:
Redesign try_reverse_output using EquivalenceProperties
is_reversemethod that could only handle basic reverse matchingEquivalencePropertiescan handle more complex scenarios, including constant column elimination and monotonic functionsSwitch to ordering_satisfy method for ordering matching
date_trunc,CAST,CEIL)Extend sort pushdown support to more operators
try_pushdown_sortimplementation forProjectionExec,FilterExec,CooperativeExecWhat changes are included in this PR?
Core Changes:
ParquetSource::try_reverse_output (datasource-parquet/src/source.rs)
eq_propertiesparameterordering_satisfyto check if reversed ordering satisfies the requestfile_orderingfield andwith_file_ordering_infomethodFileSource trait (datasource/src/file.rs)
try_reverse_outputsignature witheq_propertiesparameterFileScanConfig::try_pushdown_sort (datasource/src/file_scan_config.rs)
file_source.try_reverse_outputNew operator support
FilterExec::try_pushdown_sort- Pushes sort below filtersProjectionExec::try_pushdown_sort- Pushes sort below projectionsCooperativeExec::try_pushdown_sort- Supports sort pushdown in cooperative executionRemoved obsolete methods
LexOrdering::is_reverse- replaced byordering_satisfyFeature Enhancements:
Supported optimization scenarios:
Are these changes tested?
Yes, comprehensive tests have been added:
Test 7 (237 lines): Constant column elimination scenarios
Test 8 (355 lines): Monotonic function scenarios
date_trunc(date truncation)CAST(type conversion)CEIL(ceiling)ABS(negative case - not monotonic over mixed positive/negative range)All tests verify:
reverse_row_groups=truein physical plansAre there any user-facing changes?
API Changes:
FileSource::try_reverse_outputsignature changed (addedeq_propertiesparameter)FileSource::with_file_ordering_infomethodLexOrdering::is_reversepublic methodUser-visible improvements:
WHEREclauses that make certain columns constantNote: This PR returns
Inexactresults because only row group order is reversed, not row order within row groups. Future enhancements could include:Exact)