feat: optimise copying in left for Utf8 and LargeUtf8#19980
feat: optimise copying in left for Utf8 and LargeUtf8#19980Jefffrey merged 9 commits intoapache:mainfrom
left for Utf8 and LargeUtf8#19980Conversation
Improve performance to O(1) by eliminating more string copies. Discover a byte offset for the last character for both positive and negative length argument and slice bytes directly. For LargeUtf8, implement a zero-copy slice operation reusing same Arrow buffers. An Arrow view construction helper `shrink_string_view_array_view` is included to this PR (upstream to Arrow crates)
Jefffrey
left a comment
There was a problem hiding this comment.
Thanks for this PR. Would be good to see some benchmark results too, see:
Moreover, I find some of the points in the PR body confusing:
Improve performance to O(1) by eliminating more string copies. Discover a byte offset for the last character for both positive and negative length arguments and slice bytes directly.
Could you help me understand which changes here make it O(1)?
For LargeUtf8 (
StringViewArray), implement a zero-copy slice operation reusing the same Arrow buffers. It is possible for both views since the string only shrinks. We only need to tune a German prefix.
LargeUtf8 and Utf8View are different types, so it is confusing to see them used interchangeably.
|
Run benchmarks |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Run benchmarks |
|
🤖 Hi @theirix, thanks for the request (#19980 (comment)). |
|
Thank you for the review!
It's for memory complexity. We avoid an extra copy of the string into I cannot say about time complexity - it is improved, but not for all queries (
My bad, corrected it in the description. |
Jefffrey
left a comment
There was a problem hiding this comment.
I think we'd have to see benchmarks from the microbenchmark of left itself; the benchmark queries from the bot don't use this function I think? (Unless its used internally in joins, etc.)
Agree, here are the results of a more targeted |
|
I think it would be a good idea to enhance the benchmark to add Utf8View case as well; we can simply compare it to Utf8 numbers as the before case. Looks like we got a ~80% improvement on Utf8/LargeUtf8 by simply removing the intermediate |
Good idea, I'll add it shortly.
Indeed! Memory allocation and copying consume time. If we can go with zero copy, we should. |
|
Bench results including string view (performance improvement up to 71%): Details |
|
Thanks @theirix |
|
Thank you for the review! |
## Which issue does this PR close? - Closes #20068. ## Rationale for this change Similar to issue #19749 and the optimisation of `left` in #19980, it's worth doing the same for `right` ## What changes are included in this PR? - Improve efficiency of the function by making fewer memory allocations and going directly to bytes, based on char boundaries - Provide a specialisation for StringView with buffer zero-copy - Use `arrow_array::buffer::make_view` for low-level view manipulation (we still need to know about a magic constant 12 for a buffer layout) - Benchmark - up to 90% performance improvement ``` right size=1024/string_array positive n/1024 time: [24.286 µs 24.658 µs 25.087 µs] change: [−86.881% −86.662% −86.424%] (p = 0.00 < 0.05) Performance has improved. right size=1024/string_array negative n/1024 time: [29.996 µs 30.737 µs 31.511 µs] change: [−89.442% −89.229% −89.003%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild right size=4096/string_array positive n/4096 time: [105.58 µs 109.39 µs 113.51 µs] change: [−86.119% −85.788% −85.497%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 6 (6.00%) high mild 3 (3.00%) high severe right size=4096/string_array negative n/4096 time: [136.48 µs 138.34 µs 140.36 µs] change: [−88.007% −87.848% −87.692%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high mild right size=1024/string_view_array positive n/1024 time: [25.054 µs 25.500 µs 26.033 µs] change: [−82.569% −82.285% −81.891%] (p = 0.00 < 0.05) Performance has improved. right size=1024/string_view_array negative n/1024 time: [41.281 µs 42.730 µs 44.432 µs] change: [−73.832% −73.288% −72.716%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe right size=4096/string_view_array positive n/4096 time: [129.38 µs 133.69 µs 137.61 µs] change: [−79.497% −78.998% −78.581%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high mild right size=4096/string_view_array negative n/4096 time: [218.16 µs 229.41 µs 243.30 µs] change: [−65.405% −63.622% −61.515%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe ``` ## Are these changes tested? - Existing unit tests for `right` - Added more unit tests - Added bench similar to `right.rs` - Existing SLTs pass ## Are there any user-facing changes? No
Which issue does this PR close?
leftfunction #19749.Rationale for this change
A follow-up to an optimisation of the
leftfunction in #19571What changes are included in this PR?
Improve memory performance to O(1) by eliminating more string copies. Discover a byte offset for the last character for both positive and negative length arguments and slice bytes directly.
For
Utf8View(StringViewArray), implement a zero-copy slice operation reusing the same Arrow buffers. It is possible for both views since the string only shrinks. We only need to tune a German prefix.An Arrow view construction helper
shrink_string_view_array_viewis included in this PR. Unfortunately, string view builders cannot provide a way to reuse Arrow buffers. I believe it should better reside in the core Arrow crates instead - I can follow up on it.Are these changes tested?
Are there any user-facing changes?
No