GH-32863: [C++][Parquet] Add DELTA_BYTE_ARRAY encoder to Parquet writer#14341
GH-32863: [C++][Parquet] Add DELTA_BYTE_ARRAY encoder to Parquet writer#14341pitrou merged 78 commits intoapache:mainfrom
Conversation
37077b0 to
8448815
Compare
|
note: this should support BYTE_ARRAY and FIXED_LEN_BYTE_ARRAY. PARQUET-2231 |
|
(Can this patch go ahead now?) |
mapleFU
left a comment
There was a problem hiding this comment.
Though this patch is still draft, I run it for fun. Seems that here is some issue. May that helps
|
I'm going to merge this now. |
|
Thats a long time, bravo! |
|
Thanks all for helping this along! I'm very happy we got this in! |
|
Congratulations, @rok!!! |
|
Great work everybody, congrats!! |
|
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 94bd0d2. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them. |
|
We should probably update the python docstring as well: arrow/python/pyarrow/parquet/core.py Lines 822 to 827 in fe750ed (which was clearly already outdated before this PR as well! ;)) |
|
Feel free to open a PR :-) |
|
I created an issue :) #37312 |
| RETURN_NOT_OK(helper.builder->ReserveData( | ||
| std::min<int64_t>(len_, helper.chunk_space_remaining))); |
There was a problem hiding this comment.
Why previously here ReserveData for min(len_, helper.chunk_space_remaining) here, wouldn't len_ be too large @pitrou
There was a problem hiding this comment.
I don't understand: are you commented on the removed code? I'd rather not try to understand code that was removed months ago...
There was a problem hiding this comment.
See #38437 for context, where this code is being added back partially
There was a problem hiding this comment.
I also don't understand them T_T.
Just confused by this change, so I tried to understand the origin code and find out why this cause the regression, what should I do to fix it
…t writer (apache#14341) This is to add DELTA_BYTE_ARRAY encoder. * Closes: apache#32863 Lead-authored-by: Rok Mihevc <rok@mihevc.org> Co-authored-by: Rok <rok@mihevc.org> Co-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Co-authored-by: Gang Wu <ustcwg@gmail.com> Co-authored-by: mwish <1506118561@qq.com> Co-authored-by: Will Jones <willjones127@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
This is to add DELTA_BYTE_ARRAY encoder.