perf: optimize left function by eliminating double chars() iteration#19571
perf: optimize left function by eliminating double chars() iteration#19571viirya merged 3 commits intoapache:mainfrom
Conversation
413e195 to
08f3cdb
Compare
|
Can you make a PR with just the benchmarks (which is easier to merge) so we can then compare main vs. changed? |
|
Opened #19600. |
| chars_buf.extend(string.chars()); | ||
| let len = chars_buf.len() as i64; | ||
|
|
||
| Some(if n.abs() < len { |
There was a problem hiding this comment.
minor improvement: calling .abs() on MIN will lead to problems (crash in dev and wrapping in release).
i64::MIN is a corner case, so I guess no one will complain that it is not supported here.
I also tried with if n + len > 0 but then it was 3% slower on my machine.
| Some(if n.abs() < len { | |
| Some(if n != i64::MIN && n.abs() < len { |
There was a problem hiding this comment.
Replaced n.abs() < len with n > -len to avoid panic/wrap on i64::MIN now.
For negative n values, the function was calling string.chars() twice: 1. Once to count total characters 2. Again to take the prefix This optimization collects chars into a reusable buffer once per row for the negative n case, eliminating the redundant iteration. Benchmark results (negative n, which triggers the optimization): - size=1024: 71.323 µs → 52.760 µs (26.0% faster) - size=4096: 289.62 µs → 212.23 µs (26.7% faster) Benchmark results (positive n, minimal overhead): - size=1024: 24.465 µs → 24.691 µs (0.9% slower) - size=4096: 96.129 µs → 97.078 µs (1.0% slower) The dramatic improvement for negative n cases far outweighs the negligible overhead for positive n cases.
Replaced `n.abs() < len` with `n > -len` to avoid panic/wrap on i64::MIN. The original code used .abs() on a negative i64 value, which causes: - Panic in debug builds when n = i64::MIN (overflow check) - Wrapping behavior in release builds (undefined behavior) The new condition `n > -len` is mathematically equivalent: - When n < 0: n.abs() = -n - So: -n < len is equivalent to n > -len This handles the i64::MIN edge case safely without performance impact, as it's just a comparison with negated value instead of abs(). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
| let len = string.chars().count() as i64; | ||
| Some(if n.abs() < len { | ||
| string.chars().take((len + n) as usize).collect::<String>() | ||
| // Collect chars once and reuse for both count and take |
There was a problem hiding this comment.
You can probably use https://doc.rust-lang.org/std/primitive.str.html#method.char_indices here to avoid copying the strings into chars_buf entirely -- you could just find the correct offset and then copy the relevant bytes
If you wanted to get super fancy, you could add specializations for:
- Scalar
n(no need to expand it out to an array) - StringViewArray (no need to copy strings at all, you could just adjust the views)
However, this code seems better than what is on main! So we can merge it too and keep iterating
There was a problem hiding this comment.
Yea, let me merge it first and probably improve it later.
There was a problem hiding this comment.
There was a problem hiding this comment.
@viirya introduced some of these optimisation - I'd appreciate you taking a look at it
|
run benchmark left |
|
🤖 |
|
🤖: Benchmark completed Details
|
## Which issue does this PR close? - Closes #19749. ## Rationale for this change A follow-up to an optimisation of the `left` function in #19571 ## What changes are included in this PR? - Improve memory performance to O(1) by eliminating more string copies. Discover a byte offset for the last character for both positive and negative length arguments and slice bytes directly. - For `Utf8View` (`StringViewArray`), implement a zero-copy slice operation reusing the same Arrow buffers. It is possible for both views since the string only shrinks. We only need to tune a German prefix. - An Arrow view construction helper `shrink_string_view_array_view` is included in this PR. Unfortunately, string view builders cannot provide a way to reuse Arrow buffers. I believe it should better reside in the core Arrow crates instead - I can follow up on it. ## Are these changes tested? - Additional unit tests - SLTs ## Are there any user-facing changes? No
For negative n values, the function was calling string.chars() twice:
This optimization collects chars into a reusable buffer once per row for the negative n case, eliminating the redundant iteration.
Benchmark results (negative n, which triggers the optimization):
Benchmark results (positive n, minimal overhead):
The dramatic improvement for negative n cases far outweighs the negligible overhead for positive n cases.
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?