Skip to content

Introduce ProjectionExprs::unproject_exprs/project_exprs and improve docs#20193

Merged
adriangb merged 2 commits intoapache:mainfrom
alamb:alamb/consolidate_remapping
Feb 12, 2026
Merged

Introduce ProjectionExprs::unproject_exprs/project_exprs and improve docs#20193
adriangb merged 2 commits intoapache:mainfrom
alamb:alamb/consolidate_remapping

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Feb 6, 2026

Which issue does this PR close?

Rationale for this change

I am going through how the various layers of expression pushdown and schema rewrites work, and I spent a long time confused about what the sync_with_child parameter on update_expr did -- like what was the different betweenupdate_expr(.., true) vs update_expr(..., false) 😕

After some study I concluded it controls which way the rewrite is done (either project the expressions to refer to the projection expressions, or the opposite, 'unproject' an expression and substitute the projection definitions back in)

What changes are included in this PR?

  1. Introduce ProjectionExprs::unproject_exprs/project_exprs
  2. rename sync_with_child to unproject
  3. Improve documentation about what is done

Are these changes tested?

Yes by existing ci

Are there any user-facing changes?

There are new APIs, but no changes to existing APIs

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates datasource Changes to the datasource crate labels Feb 6, 2026
)
})
})
.map(|filter| projection.unproject_expr(&filter))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole point of the PR is to pull this function call into a method on ProjectionExprs and add documentation

};
// Now remap column indices to match the table schema.
let remapped_filters: Result<Vec<_>> = filters_to_remap
let remapped_filters = filters_to_remap
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

driveby cleanup (no functional change)

expr: &Arc<dyn PhysicalExpr>,
projected_exprs: &[ProjectionExpr],
sync_with_child: bool,
unproject: bool,
Copy link
Contributor Author

@alamb alamb Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this parameter name to be super confusing as there are no "children" involved. I suspect this is historical and predates the work to push expressions down

unproject as the best I could come up with -- but would love some thoughts on better names

@alamb alamb force-pushed the alamb/consolidate_remapping branch from e7d571b to ee658ae Compare February 6, 2026 16:51
// For example, the filter being pushed down is `c1_c2 > 5` and it was created
// against the output schema of the this `DataSource` which has projection `c1 + c2 as c1_c2`.
// Thus we need to rewrite the filter back to `c1 + c2 > 5` before passing it to the file source.
// This is necessary because filters refer to the output schema of this `DataSource`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified that the "different schema" is the "table schema"

@alamb alamb marked this pull request as ready for review February 6, 2026 17:11
@@ -842,7 +901,7 @@ pub fn combine_projections(
pub fn update_expr(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I contemplated going even farther and deprecating this function (and routing everything through ProjectionExprs) but I think that will result in some non trivial API churn and I figured we could always do it later

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we were to do this a first step would be to move the logic from update_expr into the functions on ProjectionExprs and make update_expr call those, maybe adding a note that it's "soft deprecated" and we recommend calling ProjectionExprs instead? I think that could fit in this PR.

@adriangb
Copy link
Contributor

adriangb commented Feb 6, 2026

I have not been able to wrap my head around update_expr either. I will review this later today or tomorrow 😄

@alamb
Copy link
Contributor Author

alamb commented Feb 6, 2026

I have not been able to wrap my head around update_expr either. I will review this later today or tomorrow 😄

Thanks -- no rush.

Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me this is a very nice cleanup!!

@adriangb adriangb enabled auto-merge February 12, 2026 00:51
@alamb alamb disabled auto-merge February 12, 2026 20:33
@adriangb adriangb added this pull request to the merge queue Feb 12, 2026
@alamb
Copy link
Contributor Author

alamb commented Feb 12, 2026

Thanks for the review @adriangb

Merged via the queue into apache:main with commit 9fd84e7 Feb 12, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants