ARM64-SVE: Avoid containing non-embedded conditional select #105719#105812
ARM64-SVE: Avoid containing non-embedded conditional select #105719#105812amanasifkhalid merged 8 commits intodotnet:mainfrom
Conversation
|
@dotnet/arm64-contrib @kunalspathak |
|
Added testing. |
|
cc @dotnet/jit-contrib @TIHan @amanasifkhalid @tannergooding Can one of you review? |
|
@a74nh Build Analysis is blocked by "experimental feature" warnings for the test you added; could you please add |
(You may want to make it |
src/coreclr/jit/lowerarmarch.cpp
Outdated
| if (op3->IsVectorZero() && op1->IsMaskAllBitsSet() && op2->IsEmbMaskOp()) | ||
| { | ||
| // When we are merging with zero, we can specialize | ||
| // and avoid instantiating the vector constant. | ||
| // Do this only if op1 was AllTrueMask | ||
| // Do this only if op1 was AllTrueMask and op2 is embedded. | ||
| MakeSrcContained(node, op3); | ||
| } |
There was a problem hiding this comment.
I don't think this is the right fix. The zero (and in fact any constant or other value here) is still containable.
op1 is AllBitsSet so therefore nothing from op3 can ever be selected, so it is "unused". This means it is valid to drop the sel entirely and just emit op2 directly.
The only time we should be hitting this path is when op2 is an operation that requires a mask to be specified (even if unused) or some manually written user code in minopts. For the latter, it's still fine to contain the zero constant and just use the op2 register for both inputs (containment is a basic operation that happens at all codegen levels).
There was a problem hiding this comment.
op1isAllBitsSetso therefore nothing fromop3can ever be selected, so it is "unused". This means it is valid to drop theselentirely and just emitop2directly.
Updated to do this instead.
Required the embedded HWIntrinsic check, otherwise lowering goes into an infinite loop adding and removing conditional selects.
| - src/coreclr/jit/emitarm64sve.cpp | ||
| - src/coreclr/jit/emitfmtsarm64sve.h | ||
| - src/coreclr/jit/lsraarm64.cpp | ||
| - src/coreclr/jit/lowerarmarch.cpp |
There was a problem hiding this comment.
can you also add entries for following?
- lsraarmarch.cpp
- codegenarmarch.cpp
There was a problem hiding this comment.
Done.
Also alphabetically ordered the list.
Fixes #105719
CONDSELECT(TRUE, EMBBEDMASKOP(), 0)For this scenario, during codegeneration, the SELECT will not be generated and instead just generate the embedded mask operation. To do this, op3 can be contained.
CONDSELECT(TRUE, VECTOR, 0)For this senario, the SELECT operation will be generated (and then in emit a MOV will be generated instead). To do this, op3 cannot be contained and must be generated into a register.