-
Notifications
You must be signed in to change notification settings - Fork 2k
Closed
Copy link
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
ScalarValue::to_array_of_sizecallslist_to_array_of_sizefor list-like inputslist_to_array_of_sizewill be called with a one-element list containing an Arrow array type, and it does:
let arrays = repeat_n(arr, size).collect::<Vec<_>>();
let ret = match !arrays.is_empty() {
true => arrow::compute::concat(arrays.as_slice())?,
false => arr.slice(0, 0),
};- If the input is a
StringViewArray,repeat_nwill createsizecopies, each with their own data buffers.concatpreserves those data buffers. So if the input is, say, aStringViewArraywith 500 buffers andsizeis 1024, the result will have 500k buffers.
To Reproduce
No response
Expected behavior
No response
Additional context
We probably didn't see this problem before because DF doesn't usually call ScalarValue::to_array_of_size on a list whose underlying StringViewArray has many data buffers. I ran into this when working on #3781. It crops up in a scenario like
SELECT ... FROM t WHERE array_has_any(f.c, (SELECT array_agg(...) FROM parquet_source_file))
The array_agg creates a StringViewArray with many data buffers. In current DF, this gets rewritten into a join, but after fixing #3781, array_has_any is instead invoked with the array_agg output as a scalar, so we go through this code path and run into some pain.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working