Skip to content

Conversation

@zhuqi-lucas
Copy link
Contributor

@zhuqi-lucas zhuqi-lucas commented Dec 22, 2025

Which issue does this PR close?

Rationale for this change

After #19064 was completed, I found it couldn't meet our internal project requirements:

  1. Redesign try_reverse_output using EquivalenceProperties

    • The previous implementation used a simple is_reverse method that could only handle basic reverse matching
    • Now leveraging EquivalenceProperties can handle more complex scenarios, including constant column elimination and monotonic functions
  2. Switch to ordering_satisfy method for ordering matching

    • This method internally:
      • Normalizes orderings (removes constant columns)
      • Checks monotonic functions (like date_trunc, CAST, CEIL)
      • Handles prefix matching
  3. Extend sort pushdown support to more operators

    • Added try_pushdown_sort implementation for ProjectionExec, FilterExec, CooperativeExec
    • These operators can now pass sort requests down to their children

What changes are included in this PR?

Core Changes:

  1. ParquetSource::try_reverse_output (datasource-parquet/src/source.rs)

    • Added eq_properties parameter
    • Reverses all orderings in equivalence properties
    • Uses ordering_satisfy to check if reversed ordering satisfies the request
    • Removed file_ordering field and with_file_ordering_info method
  2. FileSource trait (datasource/src/file.rs)

    • Updated try_reverse_output signature with eq_properties parameter
    • Added detailed documentation explaining parameter usage and examples
  3. FileScanConfig::try_pushdown_sort (datasource/src/file_scan_config.rs)

    • Simplified logic to directly call file_source.try_reverse_output
    • No longer needs to pre-check ordering satisfaction or set file ordering info
  4. New operator support

    • FilterExec::try_pushdown_sort - Pushes sort below filters
    • ProjectionExec::try_pushdown_sort - Pushes sort below projections
    • CooperativeExec::try_pushdown_sort - Supports sort pushdown in cooperative execution
  5. Removed obsolete methods

    • Deleted LexOrdering::is_reverse - replaced by ordering_satisfy

Feature Enhancements:

Supported optimization scenarios:

  1. Constant column elimination (Test 7)
   -- File ordering: [timeframe ASC, period_end ASC]
   -- Query: WHERE timeframe = 'quarterly' ORDER BY period_end DESC
   -- Effect: After timeframe becomes constant, reverse scan is enabled
  1. Monotonic function support (Test 8)
   -- File ordering: [ts ASC]
   -- Query: ORDER BY date_trunc('month', ts) DESC
   -- Effect: date_trunc is monotonic, reverse scan satisfies the request

Are these changes tested?

Yes, comprehensive tests have been added:

  • Test 7 (237 lines): Constant column elimination scenarios

    • Single constant column filter
    • Multi-value IN clauses (doesn't trigger optimization)
    • Literal constants in sort expressions
    • Non-leading column filters (edge cases)
  • Test 8 (355 lines): Monotonic function scenarios

    • date_trunc (date truncation)
    • CAST (type conversion)
    • CEIL (ceiling)
    • ABS (negative case - not monotonic over mixed positive/negative range)

All tests verify:

  • Presence of reverse_row_groups=true in physical plans
  • Correctness of query results

Are there any user-facing changes?

API Changes:

  • FileSource::try_reverse_output signature changed (added eq_properties parameter)
  • Removed FileSource::with_file_ordering_info method
  • Removed LexOrdering::is_reverse public method

User-visible improvements:

  • More queries can leverage reverse row group scanning for optimization
  • Especially queries with WHERE clauses that make certain columns constant
  • Queries using monotonic functions (like date functions, type conversions)

Note: This PR returns Inexact results because only row group order is reversed, not row order within row groups. Future enhancements could include:

  • File reordering based on statistics (returning Exact)
  • Partial sort pushdown for prefix matches

Copilot AI review requested due to automatic review settings December 22, 2025 03:00
@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate labels Dec 22, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR redesigns the try_reverse_output method to use equivalence properties and the ordering_satisfy API, enabling more sophisticated sort pushdown optimizations. The key improvement is the ability to recognize when reversed file scanning can satisfy requested orderings through constant column elimination, monotonic function reasoning, and prefix matching.

Key Changes:

  • Modified FileSource::try_reverse_output to accept EquivalenceProperties parameter for advanced ordering analysis
  • Replaced simple is_reverse check with ordering_satisfy in ParquetSource to handle complex cases like constant columns from filters
  • Added forward-direction check in FileScanConfig to avoid unnecessary reverse optimization when ordering is already satisfied
  • Implemented try_pushdown_sort in FilterExec and ProjectionExec to propagate sort pushdown through the plan

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
datafusion/datasource/src/file.rs Updated FileSource trait to add eq_properties parameter to try_reverse_output, with comprehensive documentation of the new approach
datafusion/datasource-parquet/src/source.rs Reimplemented try_reverse_output to use ordering_satisfy with equivalence properties instead of simple is_reverse check, enabling constant column and monotonic function handling
datafusion/datasource/src/file_scan_config.rs Added forward-direction ordering check before attempting reverse optimization and enhanced comments explaining the delegation to FileSource
datafusion/physical-plan/src/filter.rs Implemented try_pushdown_sort to propagate sort pushdown requests through filter operators
datafusion/physical-plan/src/projection.rs Implemented try_pushdown_sort to propagate sort pushdown requests through projection operators
datafusion/sqllogictest/test_files/dynamic_filter_pushdown_config.slt Added comprehensive test suite (Test 7.1-7.10) covering constant column filtering, literal constants in sort expressions, and edge cases

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@zhuqi-lucas zhuqi-lucas marked this pull request as draft December 22, 2025 08:43
@zhuqi-lucas zhuqi-lucas marked this pull request as ready for review December 23, 2025 06:47
@zhuqi-lucas
Copy link
Contributor Author

zhuqi-lucas commented Dec 23, 2025

cc @adriangb @alamb , i submitted a PR to enhance the sort pushdown because it failed my internal project after initial try, can you help review? Thanks!

Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very cool! I suggest splitting out the changes to FilterExec for further consideration and moving the tests to a different SLT file (order.slt? start an order_pushdown.slt?)

Ok(FilterPushdownPropagation::if_all(child_pushdown_result))
}

fn try_pushdown_sort(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good improvement but is orthogonal to the change to use EqProperties (just noting)

})
}

fn try_pushdown_sort(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A big picture question on this one: is it beneficial to push down sort order past a filter? If it's "free" of course. But if there's runtime cost to re-ordering rows, it may be cheaper to filter then re-order (if the filter is very selective).

I think in practice, for Parquet, everything ends up happening in the scan itself and filters always get applied first, so this is inconsequential. But in theory it could hurt data sources that e.g. don't support filter pushdown and hence rely on a FilterExec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adriangb for review, i agree, but currently i think it's good, i only add the following 3 basic cases, and don't add specific sort pushdown, just bypass to child sort pushdown to make it work:

FilterExec::try_pushdown_sort 
ProjectionExec::try_pushdown_sort 
CooperativeExec::try_pushdown_sort 

And they all just bypass self to the child sort pushdown, i was testing and also the new slt testing here, it seems need this, and they are the basic cases i think.

I also created a issue for the following operators to add more try_pushdown_sort implementation, we can investigate more for it:

#19394

SET datafusion.optimizer.enable_sort_pushdown = true;


# Test 7: Sort pushdown with constant column filtering
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why in dynamic_filter_pushdown_config.slt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @adriangb , I added sort pushdown cases here because it included topk pushdown testing here, i can add a new test file.

@zhuqi-lucas
Copy link
Contributor Author

Looks very cool! I suggest splitting out the changes to FilterExec for further consideration and moving the tests to a different SLT file (order.slt? start an order_pushdown.slt?)

Thank you @adriangb for review, i addressed the comments to add a new sort_pushdown.slt.

But i still keep the bypass sort pushdown for :

FilterExec::try_pushdown_sort 
ProjectionExec::try_pushdown_sort 
CooperativeExec::try_pushdown_sort 

And they all just bypass self to the child sort pushdown, i was testing and also the new slt testing here, it seems need this, and they are the basic cases i think.

I also created a issue for the following operators to add more try_pushdown_sort implementation, we can investigate more for it:

#19394

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Redesign the try_reverse_output to support more cases

2 participants