Skip to content

ld4 is not emitted for int32 with vectorization factor 4 #8819

@stevesuzuki-arm

Description

@stevesuzuki-arm

If check(arm32 ? "vld4.32" : "ld4", 4, ld(in_f32, 4)); is added in simd_op_check_arm.cpp, it ends up with 8 scalar load instructions although one might expect a single ld4 instruction.
This seems to be an expected behavior as shown in staged_strided_loads.cpp "Strides up to the the vector size are worth densifying. After that, it's better to just gather."

What is the exact reason that this case is hindered? Is there any idea to make it happen?
The same applies to ld2 for int64 with vectorization factor 2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions