feat: Reduce allocations for aggregating Statistics#20768
feat: Reduce allocations for aggregating Statistics#20768jonathanc-n wants to merge 12 commits intoapache:mainfrom
Statistics#20768Conversation
|
I verified this has a 5x speed up for numeric primitive values using small benchmark. felt unnecssary to add the benchmark since it is jsut a regular vectorization optimization |
StatisticsStatistics
06cd380 to
e51023e
Compare
…hanc-n/datafusion into speed-up-stats-aggregation
…hanc-n/datafusion into speed-up-stats-aggregation
|
@Dandandan cleaned up the implementation. The performance hit came from using |
Dandandan
left a comment
There was a problem hiding this comment.
I think it looks good, it could benefit though from a benchmark somewhere.
Main benchmarkSpeed up benchmarkadded bench. Looking at around a 10-12x improvement |
asolimando
left a comment
There was a problem hiding this comment.
LGTM, left a couple of minor comments and questions
| ($lhs:expr, $rhs:expr, $VARIANT:ident) => { | ||
| match ($lhs, $rhs) { | ||
| (ScalarValue::$VARIANT(Some(a)), ScalarValue::$VARIANT(Some(b))) => { | ||
| Ok(ScalarValue::$VARIANT(Some(a.wrapping_add(*b)))) |
There was a problem hiding this comment.
Correct me if I am wrong, but it seems that wrapping_add would wrap the sum upon overflow, producing a negative number, I am not sure this is what we want for statistics, I propose we either mark it Absent or at the latest mark it as Inexact (which doesn't seem to be happening) as done in Precision::add.
Not in scope for this PR, but on the subject I also think pub sum_value: Precision<ScalarValue>, unlike min and max should not have matched the data type of the column, but a wide type like Int64 to minimize the chances of overflow.
There was a problem hiding this comment.
Not in scope for this PR, but on the subject I also think
pub sum_value: Precision<ScalarValue>, unlikeminandmaxshould not have matched the data type of the column, but a wide type likeInt64to minimize the chances of overflow.
Good point, filed in #20826
There was a problem hiding this comment.
@asolimando Added the fix! 256 needs to be dealt with differently, so set up a small function for that
There was a problem hiding this comment.
Checked the new commit, thanks for addressing the comment and filing the issue, marking as resolved!
EDIT: it looks I lack privileges, I will let you do that if you want
asolimando
left a comment
There was a problem hiding this comment.
LGTM (modulo fixing the cargo doc failure)
| ScalarValue::Float64(_) => add_float!(lhs, rhs, Float64), | ||
| ScalarValue::Decimal32(_, _, _) => add_decimal!(lhs, rhs, Decimal32), | ||
| ScalarValue::Decimal64(_, _, _) => add_decimal!(lhs, rhs, Decimal64), | ||
| ScalarValue::Decimal128(_, _, _) => add_decimal!(lhs, rhs, Decimal128), |
There was a problem hiding this comment.
For Decimal128(Some(a), p, s), saturating to i128::MAX doesn't respect the precision constraint. A Decimal128 with precision 5 can't hold i128::MAX.
For example:
let a = Decimal128(Some(99999), 5, 2); // 999.99
let b = Decimal128(Some(99999), 5, 2); // 999.99
| @@ -0,0 +1,149 @@ | |||
| // Licensed to the Apache Software Foundation (ASF) under one | |||
There was a problem hiding this comment.
Not sure if we can put it into a higher level, is it possible that other nodes may use this code in the future, not only agg.
|
run benchmark sql_planner |
| /// converts the result back — 3 heap allocations per call. | ||
| /// | ||
| /// For non-primitive types, falls back to `ScalarValue::add`. | ||
| pub(crate) fn scalar_add(lhs: &ScalarValue, rhs: &ScalarValue) -> Result<ScalarValue> { |
There was a problem hiding this comment.
I bet you can make this even faster by making it mutate lhs rather than make a new one
pub(crate) fn scalar_add(lhs: &mut ScalarValue, rhs: &ScalarValue) -> Result<()> {| pub(crate) fn precision_add( | ||
| lhs: &Precision<ScalarValue>, | ||
| rhs: &Precision<ScalarValue>, | ||
| ) -> Precision<ScalarValue> { |
There was a problem hiding this comment.
Ditto here -- reusing lhs is probably even faster
| //! | ||
| //! Provides a cheap pairwise [`ScalarValue`] addition that directly | ||
| //! extracts inner primitive values, avoiding the expensive | ||
| //! `ScalarValue::add` path (which round-trips through Arrow arrays). |
There was a problem hiding this comment.
Why not just add this special case to ScalarValue::add ?
|
🤖 |
|
🤖: Benchmark completed Details
|
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
Vectorize aggregations for combining statistics by gathering all values then calling kernels once
Are these changes tested?
Unit tests + existing tests
Are there any user-facing changes?
Removed
merge_iter