feat: implement IsNotNull expression in vortex expression library#6969
Open
xiaoxuandev wants to merge 1 commit intovortex-data:developfrom
Open
feat: implement IsNotNull expression in vortex expression library#6969xiaoxuandev wants to merge 1 commit intovortex-data:developfrom
xiaoxuandev wants to merge 1 commit intovortex-data:developfrom
Conversation
Add a first-class IsNotNull scalar function instead of composing Not(IsNull(...)). This simplifies the expression tree, enables direct stat_falsification for zone map pruning, and updates all integration points (DataFusion, DuckDB, Python/Substrait). The stat_falsification uses is_constant && null_count > 0 as an approximation since there is no RowCount stat yet. Closes: vortex-data#6040
| // Since there is no RowCount stat in the zone map, we approximate using IsConstant: | ||
| // if the zone is constant and has any nulls, then all values must be null. | ||
| // | ||
| // TODO(vortex-6040): Add a RowCount stat to enable the more general falsification: |
| catalog: &dyn StatsCatalog, | ||
| ) -> Option<Expression> { | ||
| // is_not_null is falsified when ALL values are null, i.e. null_count == row_count. | ||
| // Since there is no RowCount stat in the zone map, we approximate using IsConstant: |
Comment on lines
+86
to
+89
| if let Some(scalar) = child.as_constant() { | ||
| return Ok(ConstantArray::new(!scalar.is_null(), args.row_count()).into_array()); | ||
| } | ||
|
|
Contributor
There was a problem hiding this comment.
this is unneeded the validity will do this
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes: #6040
Add a first-class IsNotNull scalar function, replacing the previous Not(IsNull(...)) composition pattern. This simplifies the expression tree and enables direct stat_falsification for zone map pruning.
Changes:
New is_not_null.rs with ScalarFnVTable implementation, including stat_falsification using is_constant && null_count > 0 (with TODO for future RowCount stat)
Updated all integration points: DataFusion, DuckDB, Python/Substrait to use is_not_null(...) directly
Replaced the Not(IsNull(...)) fallback in erased.rs validity with IsNotNull
Registered IsNotNull in ScalarFnSession and ExprBuiltins/ArrayBuiltins
AI Assistance Disclosure
This PR was developed with AI assistance (Kiro). AI was used for code review, implementing stat_falsification, writing tests, and drafting the PR description. All output was reviewed and validated by the author.
API Changes
New public APIs:
vortex_array::expr::is_not_null(child) — creates an IsNotNull expression
Expression::is_not_null() / ArrayRef::is_not_null() via ExprBuiltins/ArrayBuiltins traits
Python: vortex._lib.expr.is_not_null(child)
Testing
9 unit tests covering: return dtype, child replacement, mixed/all-valid/all-invalid evaluation, struct field access, display formatting, null sensitivity, and stat falsification pruning expression generation.