[branch-53] perf: Optimize to_char to allocate less, fix NULL handling (#20635)#20885
Merged
alamb merged 1 commit intoapache:branch-53from Mar 12, 2026
Merged
Conversation
…20635) ## Which issue does this PR close? - Closes apache#20634. ## Rationale for this change The current `to_char` implementation (both scalar and array paths) allocates a new string for every input row to hold the result of the `ArrayFormatter`. To produce the results, it uses `StringArray::from`, which copies. We can do better by using a reusable buffer to store the result of `ArrayFormatter`, and then append to `StringBuilder`. We can then construct the final result values from the `StringBuilder` very efficiently. This yields a 10-22% improvement on the `to_char` microbenchmarks. In the scalar path, we can do a bit better by having the `ArrayFormatter` write directly into the `StringBuilder`'s buffer, which saves a copy. In the array path, we can't easily do this, because `to_char` has some (dubious) logic to retry errors on `Date32` inputs by casting to `Date64` and retrying the ArrayFormatter. Since the `StringBuilder` is shared between all rows and there's no easy way to remove the partial write that might happen in the case of an error, we use the intermediate buffer here instead. This PR also cleans up various code in `to_char`, and fixes a bug in `NULL` handling: in the array case, if the current data value is NULL but the format string is non-NULL, we incorrectly returned an empty string instead of NULL. ## What changes are included in this PR? * Optimize `to_char` (scalar and array paths) as described above * Fix bug in NULL handling * Add SLT test case for NULL handling bug * Simplify and refactor various parts of `to_char`, particularly around error handling * Revise `to_char` benchmarks: the previous version tested `to_char` for `Date32` inputs where the format string specifies a time value, which is an odd corner case. Also get rid of benchmarking the scalar-scalar case; this is very fast and occurs rarely in practice (will usually be constant-folded before query execution). ## Are these changes tested? Yes. Benchmarked and added new test. ## Are there any user-facing changes? No.
comphead
approved these changes
Mar 11, 2026
Contributor
Author
|
Thanks for the reviews @comphead |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
53.0.0(Feb 2026 / Mar 2026) #19692to_char#20634 on branch-53This PR:
to_charto allocate less, fix NULL handling #20635 from @neilconway to the branch-53 line