Fix quadratic runtime in min_max_bytes#18044
Conversation
78333ab to
34e6713
Compare
34e6713 to
4d6e5e6
Compare
Jefffrey
left a comment
There was a problem hiding this comment.
Would you be able to post the results of the benchmark as well, for visibility?
| pub mod hash_map { | ||
| pub use hashbrown::hash_map::Entry; | ||
| } | ||
| pub mod hash_set { | ||
| pub use hashbrown::hash_set::Entry; | ||
| } |
There was a problem hiding this comment.
Is this to avoid adding an explicit hashbrown dependency to functions-aggregate?
There was a problem hiding this comment.
Yes, the datafusion_common::HashMap::entry API was not usable without adding an explicit dependency on the hashbrown crate, at which point there is little benefit over using hashbrown::HashMap directly
There was a problem hiding this comment.
Yeah I guess it makes sense given we already export hashbrown::HashMap here already 👍
Of course! Please keep in mind that this is a very specific workload and not broader regression testing Benchmark results |
Jefffrey
left a comment
There was a problem hiding this comment.
Makes sense to me; we're just allocating for only the groups we need instead of for all groups (even if not being used in the current update_batch) if I understand correctly
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
Very nice! |
|
Thanks again @ctsk |
| // in this batch (either the existing min/max or the new input value) | ||
| // and updating the owned values in `self.min_maxes` at most once | ||
| let mut locations = vec![MinMaxLocation::ExistingMinMax; total_num_groups]; | ||
| let mut locations = HashMap::<usize, &[u8]>::with_capacity(group_indices.len()); |
There was a problem hiding this comment.
i tried a few different thing to avoid having to allocate a HashMap -- like ideally we could at least reuse the allocation from invocation to invocation
However, as it is setup now it has a slice into the input array so any structure ends up with a lifetime tied to the input, which I couldn't figure out how to reuse across calls to different input 🤔
## Which issue does this PR close? - Closes apache#17897 ## What changes are included in this PR? This PR replaces the `locations` vector used to reduce the number of allocations / resizes in the accumulator with. a HashMap instead. ## Are these changes tested? Not in particular. Additional unit-tests and broader regression testing would be useful. A microbenchmark verifies that the runtime is no longer quadratic. ## Are there any user-facing changes? No. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
## Which issue does this PR close? - Closes apache#17897 ## What changes are included in this PR? This PR replaces the `locations` vector used to reduce the number of allocations / resizes in the accumulator with. a HashMap instead. ## Are these changes tested? Not in particular. Additional unit-tests and broader regression testing would be useful. A microbenchmark verifies that the runtime is no longer quadratic. ## Are there any user-facing changes? No. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Which issue does this PR close?
What changes are included in this PR?
This PR replaces the
locationsvector used to reduce the number of allocations / resizes in the accumulator with. a HashMap instead.Are these changes tested?
Not in particular. Additional unit-tests and broader regression testing would be useful. A microbenchmark verifies that the runtime is no longer quadratic.
Are there any user-facing changes?
No.