RISC-V:Optimize decompression throughput by mirroring AVX fast-path for RVV short memcpy +15%#235
Open
anthony-zy wants to merge 1 commit intogoogle:mainfrom
Open
RISC-V:Optimize decompression throughput by mirroring AVX fast-path for RVV short memcpy +15%#235anthony-zy wants to merge 1 commit intogoogle:mainfrom
anthony-zy wants to merge 1 commit intogoogle:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Title
Optimize RVV memcpy path to mirror AVX fast-path for short copies
Summary
Refactors the RISC-V Vector (RVV) acceleration path in the short memcpy helper to mirror the existing AVX implementation. Instead of a generic
vsetvlloop, this version performs a fixed 32-byte vector copy (with an optional second 32-byte segment), matching the profile-driven assumption that nearly all copies are short.This eliminates loop overhead in the hot path and aligns RVV behavior with the well-tuned AVX code path.
Performance
Tested with lzbench on Spacemit(R) X60 (RISC-V, RVV 1.0):
Implementation Details
__riscv_vsetvl_e8m2(32)to configure a 32-byte vector operation, matching thekShortMemCopyconstant used in the AVX path.SNAPPY_PREDICT_FALSE) only whensize > kShortMemCopy, since profiling shows long copies are rare.e8m1andVLEN < 256(e.g.,VLEN=128,vl=16), two segments per 32 B would be required; usinge8m2ensures 32 B per op on common configurations.Verification
-march=rv64gcv