Skip to content

Conversation

@kosiew
Copy link
Contributor

@kosiew kosiew commented Dec 24, 2025

Which issue does this PR close?

Rationale for this change

A panic could occur during SanityCheckPlan when optimizing / executing queries that chain multiple aggregations over a multi-partitioned input (often empty tables), e.g. Sort -> Aggregate(ts, region) -> Sort -> Aggregate(ts).

In these cases the first aggregation can produce a hash partitioning that is a superset of the second aggregation’s required distribution keys. The optimizer previously treated some of these situations as not needing a hash repartition (especially when the input was empty / stats suggested repartition wasn’t beneficial), which could leave the plan violating the downstream operator’s distribution requirement and trigger SanityCheckPlan failures.

This PR aligns the optimizer’s distribution enforcement decisions with the same partitioning satisfaction logic used by the sanity checker, ensuring that required repartitions are inserted whenever the downstream requirement is not satisfied.

What changes are included in this PR?

  • Introduced DistributionSatisfactionResult and a shared distribution_satisfaction(...) helper in enforce_distribution to compute and carry:

    • the required Distribution
    • the child’s output_partitioning
    • the computed PartitioningSatisfaction (Exact / Subset / NotSatisfied)
    • a single, reusable requires_repartition(...) decision function.
  • Updated add_hash_on_top and ensure_distribution to use the shared satisfaction result and decision logic, rather than ad-hoc checks.

  • Simplified get_repartition_requirement_status by removing alignment logic based on hash_necessary flags and instead deferring to the satisfaction-based repartition decision.

  • Updated SanityCheckPlan to reuse the same distribution_satisfaction(...) helper (so optimizer and sanity checker evaluate distribution satisfaction consistently).

  • Re-exported PartitioningSatisfaction from datafusion_physical_expr for use in tests and downstream code.

  • Added targeted regression tests:

    • Unit tests verifying PartitioningSatisfaction outcomes (Exact / Subset / NotSatisfied) match sanity checker behavior for hash partitioning.
    • A tokio regression test reproducing the chained-aggregate scenario from Sanity check failed when sort and aggregate on a multi-partitioned table #18989 and asserting the optimized plan either inserts a repartition between aggregates or preserves a single-partition stream via SortPreservingMergeExec, and executes without panic.
  • Updated sqllogictest expected plans where distribution enforcement now inserts RepartitionExec and/or preserves partitioning through SortExec.

Are these changes tested?

Yes.

  • Added new unit tests in datafusion/core/tests/physical_optimizer/enforce_distribution.rs to validate distribution satisfaction behavior and its alignment with SanityCheckPlan.
  • Added an async regression test covering the multi-partition empty table + chained aggregations scenario from Sanity check failed when sort and aggregate on a multi-partitioned table #18989.
  • Updated sqllogictest outputs to reflect the new physical plans.

Performance sanity checks:

  • Ran the following benchmarks and verified no performance regressions:

    • cargo bench --bench topk_aggregate --profile=profiling -- --save-baseline before
    • cargo bench --bench distinct_query_sql --profile=profiling -- --save-baseline before
    • cargo bench --bench sort_preserving_merge --profile=profiling -- --save-baseline before

Are there any user-facing changes?

Potentially.

  • Some queries may now include additional RepartitionExec operators (or preserve partitioning through sorts) to correctly satisfy downstream distribution requirements. This can change the displayed physical plan (EXPLAIN output) and may affect performance characteristics in edge cases, but fixes a correctness/stability issue (prevents optimizer-time / execution-time panics).
  • No SQL syntax or API behavior changes are intended.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

Include a regression test that verifies the optimized plan
maintains a safe distribution path through either
repartitioning or sort-preserving merge. Ensure it
executes correctly without panicking. Register the new
test module in the physical optimizer integration test suite.
Flag unmet hash distributions when existing partitioning keys
differ from the requirements, regardless of partition counts.
Add a superset-hash unit test to ensure EnforceDistribution
inserts the necessary repartition while preserving the
original partitioning layer.
Add a shared DistributionSatisfactionResult helper to
streamline partitioning satisfaction checks. Update the
SanityCheckPlan to utilize this helper and re-export
PartitioningSatisfaction for better cross-crate reuse.

Include unit tests for exact, subset, and superset hash
partitioning scenarios to ensure consistent enforcement and
sanity checking interpretations.
Reinstate logic to skip unnecessary repartitioning for
single-partition joins, window partitions, and grouped
unions, matching previous behavior and maintaining
regression expectations.
Add necessary imports to enforce_distribution.rs and remove unused
BoundedWindowAggExec import. Move the test for superset hash keys
to a more logical position, ensuring consistency in testing for
hash join partitioning.

Update test expectations to reflect current optimizer behavior
by removing unnecessary superset repartitions and replacing
them with exact match requirements. Revise comments and
assertions to align with the improved optimization.
Delete the original enforce_distribution_superset.rs file.
@github-actions github-actions bot added physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Dec 24, 2025
@github-actions github-actions bot added the sql SQL Planner label Dec 24, 2025
@kosiew kosiew marked this pull request as ready for review December 24, 2025 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate optimizer Optimizer rules physical-expr Changes to the physical-expr crates sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sanity check failed when sort and aggregate on a multi-partitioned table

1 participant