Skip to content

perf: optimize left function by eliminating double chars() iteration#19571

Merged
viirya merged 3 commits intoapache:mainfrom
viirya:left_optimize
Jan 10, 2026
Merged

perf: optimize left function by eliminating double chars() iteration#19571
viirya merged 3 commits intoapache:mainfrom
viirya:left_optimize

Conversation

@viirya
Copy link
Member

@viirya viirya commented Dec 30, 2025

For negative n values, the function was calling string.chars() twice:

  1. Once to count total characters
  2. Again to take the prefix

This optimization collects chars into a reusable buffer once per row for the negative n case, eliminating the redundant iteration.

Benchmark results (negative n, which triggers the optimization):

  • size=1024: 71.323 µs → 52.760 µs (26.0% faster)
  • size=4096: 289.62 µs → 212.23 µs (26.7% faster)

Benchmark results (positive n, minimal overhead):

  • size=1024: 24.465 µs → 24.691 µs (0.9% slower)
  • size=4096: 96.129 µs → 97.078 µs (1.0% slower)

The dramatic improvement for negative n cases far outweighs the negligible overhead for positive n cases.

Which issue does this PR close?

  • Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the functions Changes to functions implementation label Dec 30, 2025
@viirya viirya requested a review from andygrove December 31, 2025 00:50
@adriangb
Copy link
Contributor

Can you make a PR with just the benchmarks (which is easier to merge) so we can then compare main vs. changed?

@viirya
Copy link
Member Author

viirya commented Jan 2, 2026

Opened #19600.

chars_buf.extend(string.chars());
let len = chars_buf.len() as i64;

Some(if n.abs() < len {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor improvement: calling .abs() on MIN will lead to problems (crash in dev and wrapping in release).
i64::MIN is a corner case, so I guess no one will complain that it is not supported here.
I also tried with if n + len > 0 but then it was 3% slower on my machine.

Suggested change
Some(if n.abs() < len {
Some(if n != i64::MIN && n.abs() < len {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced n.abs() < len with n > -len to avoid panic/wrap on i64::MIN now.

For negative n values, the function was calling string.chars() twice:
1. Once to count total characters
2. Again to take the prefix

This optimization collects chars into a reusable buffer once per row
for the negative n case, eliminating the redundant iteration.

Benchmark results (negative n, which triggers the optimization):
- size=1024: 71.323 µs → 52.760 µs  (26.0% faster)
- size=4096: 289.62 µs → 212.23 µs  (26.7% faster)

Benchmark results (positive n, minimal overhead):
- size=1024: 24.465 µs → 24.691 µs  (0.9% slower)
- size=4096: 96.129 µs → 97.078 µs  (1.0% slower)

The dramatic improvement for negative n cases far outweighs the
negligible overhead for positive n cases.
Replaced `n.abs() < len` with `n > -len` to avoid panic/wrap on i64::MIN.

The original code used .abs() on a negative i64 value, which causes:
- Panic in debug builds when n = i64::MIN (overflow check)
- Wrapping behavior in release builds (undefined behavior)

The new condition `n > -len` is mathematically equivalent:
- When n < 0: n.abs() = -n
- So: -n < len is equivalent to n > -len

This handles the i64::MIN edge case safely without performance impact,
as it's just a comparison with negated value instead of abs().

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @viirya

let len = string.chars().count() as i64;
Some(if n.abs() < len {
string.chars().take((len + n) as usize).collect::<String>()
// Collect chars once and reuse for both count and take
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably use https://doc.rust-lang.org/std/primitive.str.html#method.char_indices here to avoid copying the strings into chars_buf entirely -- you could just find the correct offset and then copy the relevant bytes

If you wanted to get super fancy, you could add specializations for:

  1. Scalar n (no need to expand it out to an array)
  2. StringViewArray (no need to copy strings at all, you could just adjust the views)

However, this code seems better than what is on main! So we can merge it too and keep iterating

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, let me merge it first and probably improve it later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya introduced some of these optimisation - I'd appreciate you taking a look at it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further improvements are done via #20103, #20068, #19749

@alamb
Copy link
Contributor

alamb commented Jan 9, 2026

run benchmark left

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing left_optimize (745120e) to 20870da diff
BENCH_NAME=left
BENCH_COMMAND=cargo bench --features=parquet --bench left
BENCH_FILTER=
BENCH_BRANCH_NAME=left_optimize
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                             left_optimize                          main
-----                             -------------                          ----
left size=1024/negative n/1024    1.00    103.4±0.37µs        ? ?/sec    1.43    148.4±0.87µs        ? ?/sec
left size=1024/positive n/1024    1.04     47.4±0.70µs        ? ?/sec    1.00     45.8±0.25µs        ? ?/sec
left size=4096/negative n/4096    1.00    410.2±5.67µs        ? ?/sec    1.44   588.9±11.16µs        ? ?/sec
left size=4096/positive n/4096    1.03    187.2±2.27µs        ? ?/sec    1.00    181.5±2.36µs        ? ?/sec

@viirya viirya added this pull request to the merge queue Jan 10, 2026
Merged via the queue into apache:main with commit 458b491 Jan 10, 2026
28 checks passed
@viirya
Copy link
Member Author

viirya commented Jan 10, 2026

Thanks @alamb @martin-g @adriangb

github-merge-queue bot pushed a commit that referenced this pull request Jan 29, 2026
## Which issue does this PR close?

- Closes #19749.

## Rationale for this change

A follow-up to an optimisation of the `left` function in #19571

## What changes are included in this PR?

- Improve memory performance to O(1) by eliminating more string copies.
Discover a byte offset for the last character for both positive and
negative length arguments and slice bytes directly.

- For `Utf8View` (`StringViewArray`), implement a zero-copy slice
operation reusing the same Arrow buffers. It is possible for both views
since the string only shrinks. We only need to tune a German prefix.

- An Arrow view construction helper `shrink_string_view_array_view` is
included in this PR. Unfortunately, string view builders cannot provide
a way to reuse Arrow buffers. I believe it should better reside in the
core Arrow crates instead - I can follow up on it.

## Are these changes tested?

- Additional unit tests
- SLTs

## Are there any user-facing changes?

No
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants