-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Support strided Load/Store in SVE2 #8888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Ready for review, but we need to rebase once #8887 is merged. |
I'll review once the diff is clean. |
e54a3d8 to
fd78420
Compare
|
I just merged #8887. Once you rebase this, I'll review 🙂 |
Structured load relies on shuffle_vectors for scalable vector, which is lowered to llvm.vector.deinterleave
fd78420 to
1b02720
Compare
Correct vector bits in load/store test cases for target with 256 bits vector
|
Ready for review now |
|
Failures should be fixed by #8897 |
With old LLVM (v20, v21), some of the tests of load/store in simd_op_check_sve2 fails due to LLVM crash with the messages: - Invalid size request on a scalable vector - Cannot select: t11: nxv16i8,nxv16i8,nxv16i8 = vector_deinterleave t43, t45, t47
alexreinking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about degrading codegen quality on LLVM<22. Please unskip the check_arm_load_store test and adjust the test and/or the patch.
| // ST3 - Store three-element structures | ||
| for (int width = base_vec_bits * 3; width <= base_vec_bits * 3 * 2; width *= 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These loops are confusing. I wonder if explicitly listing test cases would be better?
| // ST3 - Store three-element structures | |
| for (int width = base_vec_bits * 3; width <= base_vec_bits * 3 * 2; width *= 2) { | |
| // ST3 - Store three-element structures | |
| for (int width : {base_vec_bits * 3, base_vec_bits * 3 * 2}) { |
You could also consider iterating over factors.
| // ST3 - Store three-element structures | |
| for (int width = base_vec_bits * 3; width <= base_vec_bits * 3 * 2; width *= 2) { | |
| // ST3 - Store three-element structures | |
| for (int factor : {1, 2}) { | |
| int width = base_vec_bits * 3 * factor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| check_arm_load_store(); | ||
| if (Halide::Internal::get_llvm_version() >= 220) { | ||
| check_arm_load_store(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skipping this test is a red flag. We can't degrade codegen quality on supported LLVM versions, especially not across all released LLVM versions. That test (and perhaps this patch) should be adjusted to confirm superior codegen on LLVM >=22 (or more broadly), but it must also confirm the existing behavior on LLVM<22.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the feedback. I've updated so that test scope before this PR is kept with old LLVM. The test cases enabled by this PR are performed only with LLVM >= 22.
|
|
||
| auto ext = Internal::get_output_info(target); | ||
| std::map<OutputFileType, std::string> outputs = { | ||
| {OutputFileType::stmt, file_name + ext.at(OutputFileType::stmt).extension}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, removed
| for (int factor = 1; factor <= 4; factor *= 2) { | ||
| const int vector_lanes = base_vec_bits * factor / bits; | ||
|
|
||
| // In StageStridedLoads.cp (stride < r->lanes) is the condition for staging to happen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo:
| // In StageStridedLoads.cp (stride < r->lanes) is the condition for staging to happen | |
| // In StageStridedLoads.cpp (stride < r->lanes) is the condition for staging to happen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
- Keep the test scope before this PR in case of old LLVM - Fix issue halide#8584 - Incorporated the review feedbacks
Strided load relies on shuffle_vectors for scalable vector,
which is lowered to llvm.vector.deinterleave.
Enabled LDN/STN test cases only if LLVM >= 22.
Modified test configurations for LDN in simd_op_check_sve2 based on #8819.
Also, fixed the issue #8584