Skip to content

Conversation

@stevesuzuki-arm
Copy link
Contributor

@stevesuzuki-arm stevesuzuki-arm commented Nov 27, 2025

Strided load relies on shuffle_vectors for scalable vector,
which is lowered to llvm.vector.deinterleave.

Enabled LDN/STN test cases only if LLVM >= 22.
Modified test configurations for LDN in simd_op_check_sve2 based on #8819.
Also, fixed the issue #8584

@stevesuzuki-arm
Copy link
Contributor Author

Ready for review, but we need to rebase once #8887 is merged.

@alexreinking
Copy link
Member

Ready for review, but we need to rebase once #8887 is merged.

I'll review once the diff is clean.

@alexreinking
Copy link
Member

I just merged #8887. Once you rebase this, I'll review 🙂

Structured load relies on shuffle_vectors for scalable vector,
which is lowered to llvm.vector.deinterleave
@stevesuzuki-arm stevesuzuki-arm marked this pull request as ready for review December 11, 2025 17:42
Correct vector bits in load/store test cases for target
with 256 bits vector
@stevesuzuki-arm
Copy link
Contributor Author

Ready for review now

@alexreinking
Copy link
Member

Failures should be fixed by #8897

With old LLVM (v20, v21), some of the tests of load/store
in simd_op_check_sve2 fails due to LLVM crash with the messages:
- Invalid size request on a scalable vector
- Cannot select: t11: nxv16i8,nxv16i8,nxv16i8 = vector_deinterleave t43, t45, t47
Copy link
Member

@alexreinking alexreinking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about degrading codegen quality on LLVM<22. Please unskip the check_arm_load_store test and adjust the test and/or the patch.

Comment on lines 792 to 793
// ST3 - Store three-element structures
for (int width = base_vec_bits * 3; width <= base_vec_bits * 3 * 2; width *= 2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These loops are confusing. I wonder if explicitly listing test cases would be better?

Suggested change
// ST3 - Store three-element structures
for (int width = base_vec_bits * 3; width <= base_vec_bits * 3 * 2; width *= 2) {
// ST3 - Store three-element structures
for (int width : {base_vec_bits * 3, base_vec_bits * 3 * 2}) {

You could also consider iterating over factors.

Suggested change
// ST3 - Store three-element structures
for (int width = base_vec_bits * 3; width <= base_vec_bits * 3 * 2; width *= 2) {
// ST3 - Store three-element structures
for (int factor : {1, 2}) {
int width = base_vec_bits * 3 * factor;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

check_arm_load_store();
if (Halide::Internal::get_llvm_version() >= 220) {
check_arm_load_store();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping this test is a red flag. We can't degrade codegen quality on supported LLVM versions, especially not across all released LLVM versions. That test (and perhaps this patch) should be adjusted to confirm superior codegen on LLVM >=22 (or more broadly), but it must also confirm the existing behavior on LLVM<22.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback. I've updated so that test scope before this PR is kept with old LLVM. The test cases enabled by this PR are performed only with LLVM >= 22.


auto ext = Internal::get_output_info(target);
std::map<OutputFileType, std::string> outputs = {
{OutputFileType::stmt, file_name + ext.at(OutputFileType::stmt).extension},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, removed

for (int factor = 1; factor <= 4; factor *= 2) {
const int vector_lanes = base_vec_bits * factor / bits;

// In StageStridedLoads.cp (stride < r->lanes) is the condition for staging to happen
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo:

Suggested change
// In StageStridedLoads.cp (stride < r->lanes) is the condition for staging to happen
// In StageStridedLoads.cpp (stride < r->lanes) is the condition for staging to happen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- Keep the test scope before this PR in case of old LLVM
- Fix issue halide#8584
- Incorporated the review feedbacks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants