Skip to content

Conversation

@fangchenli
Copy link
Member

No description provided.

fangchenli and others added 4 commits December 16, 2025 21:10
… arrays

Previously, value_counts_internal would convert Arrow array counts to
NumPy just to call .sum() for normalization. This is unnecessary since
Series.sum() works correctly for all backends.

Changes:
- Remove unnecessary np.asarray(counts) conversion for Arrow arrays
- Remove unused counts variable assignments from bins and MultiIndex branches
- Use result.sum() instead of counts.sum() for normalization

This eliminates a performance bottleneck where Arrow-backed Series
would fall back to NumPy during value_counts operations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
… arrays

Previously, value_counts_internal would convert Arrow array counts to
NumPy just to call .sum() for normalization. This is unnecessary since
Series.sum() works correctly for all backends.

Changes:
- Remove unnecessary np.asarray(counts) conversion for Arrow arrays
- Remove unused counts variable assignments from bins and MultiIndex branches
- Use result.sum() instead of counts.sum() for normalization

This eliminates a performance bottleneck where Arrow-backed Series
would fall back to NumPy during value_counts operations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The previous change to avoid unnecessary to_numpy conversion broke
normalization when bins is used. Bins normalization should divide by
the total input length, not the sum of counts in bins.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@fangchenli fangchenli marked this pull request as ready for review December 17, 2025 07:46
@mroeschke mroeschke added Performance Memory or execution speed performance Arrow pyarrow functionality labels Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arrow pyarrow functionality Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants