JIT ARM64-SVE: Add FK_3{A,B,C}, EJ_3A, EK_3A, EY_3B, EW_3{A,B}#98187
JIT ARM64-SVE: Add FK_3{A,B,C}, EJ_3A, EK_3A, EY_3B, EW_3{A,B}#98187amanasifkhalid merged 12 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsPart of #94549. Implements the following encodings:
cstool output: JitDisasm output: cc @dotnet/arm64-contrib
|
|
@a74nh sorry I stole one of your encodings; I thought I would get |
| INS_OPTS_SCALABLE_H); // UDOT <Zda>.S, <Zn>.H, <Zm>.H[<imm>] | ||
|
|
||
| // IF_SVE_EJ_3A | ||
| theEmitter->emitIns_R_R_R_I(INS_sve_cdot, EA_SCALABLE, REG_V0, REG_V1, REG_V2, 0, |
There was a problem hiding this comment.
For these instructions that take a rotation value, shouldn't we be passing 0, 90, 180, 270 instead of 0, 1, 2, 3?
I also implemented a few instructions that needed rotation values here: #98141 which I am passing 0, 90, 180, 270.
There was a problem hiding this comment.
It looks a little weird passing 0-3 instead of 0-270, but I did this to match the bit-level representation of the rotation value, so that we don't need a helper method to encode the rr bits; then when displaying the instruction in JitDisasms, I multiply the immediate by 90 to display it correctly. I'm fine changing my approach to match yours, though I'll have to update a few encodings already merged in. @kunalspathak @a74nh do you have any preference?
There was a problem hiding this comment.
The way I look at is the emitIns functions are APIs that should try to match what the instructions actually are. The bit-level representation/encoding is an implementation detail.
There was a problem hiding this comment.
I see. In that case, how about I wait for #98141 to be merged in, and then I'll update my encodings that use rotation values to use the helper methods you added?
There was a problem hiding this comment.
I would prefer we write for readability first if we know that such an optimization would not make any difference in 99.9% of scenarios.
However, I'm fine with encoding it as 0, 1, 2, 3 on instrDesc if that is what we all want. I'll have to adjust my work as well.
There was a problem hiding this comment.
I would prefer we write for readability first
It is readable from the perspective of calling the emitIns* method as seen here: https://github.com/dotnet/runtime/pull/98187/files#diff-d4f9f119d0a321cea7e82023cb754d8abdb800d6185c8bb9464d389ebd50debcR6288
and we flip it just before saving it in instrdesc:
I am not sure if emitOutputInstr() method is readable anyway :)
There was a problem hiding this comment.
I'm sympathetic to the readability argument. I guess the silver lining with our current approach is the bitwise representation of the rotation value is abstracted away from the API surface (i.e. the emitIns methods). Maybe I'm being naive, but I don't anticipate the code for handling the rotation values in emitIns or emitDispInsHelp changing with any frequency after this is merged in, whereas the usage of emitIns will certainly increase once we start using these SVE instructions. So in the "important" case, readability isn't hindered.
There was a problem hiding this comment.
It isn't necessarily about making emitOutputInstr readable, but about the display code being simple. We will have to decode the imm whose values are 0-3 to be translated to 0-270 on display.
There was a problem hiding this comment.
I guess what I'm trying to say is, if we encode the values as 0-3 on the instrDesc, we will have to have an encode/decode for it, whereas if we store 0-270, we only need one encode function.
| { | ||
| assert(isValidUimm4(imm)); // ii rr | ||
| assert((REG_V0 <= reg3) && (reg3 <= REG_V7)); | ||
| fmt = IF_SVE_FA_3A; |
There was a problem hiding this comment.
what are these changes? I see that we moved it under emitIns_R_R_R_I_I() and wondering why this was not done when we implemented SVE_FA_3A, SVE_FA_3B, etc. ?
There was a problem hiding this comment.
When I first added emitIns_R_R_R_I_I, I was trying to minimize code duplication, so I just made it a wrapper for emitIns_R_R_R_I by bitwise OR-ing imm1 and imm2 into one imm, and then passing this along to emitIns_R_R_R_I to do the rest. Now I'm running into instructions that have encodings that take one immediate, and encodings that take two immediates, so it's easier to separate these two emitIns methods out.
src/coreclr/jit/emitarm64.cpp
Outdated
There was a problem hiding this comment.
worth adding a function like isLowVectorRegister()?
|
@kunalspathak thanks for the review -- I applied your feedback |
Diff results for #98187Throughput diffsThroughput diffs for linux/arm64 ran on linux/x64MinOpts (+0.00% to +0.01%)
Details here |
Diff results for #98187Throughput diffsThroughput diffs for windows/arm64 ran on windows/x64MinOpts (-0.00% to +0.01%)
Details here Throughput diffs for linux/arm64 ran on linux/x64MinOpts (+0.00% to +0.01%)
Details here |
Diff results for #98187Throughput diffsThroughput diffs for linux/arm64 ran on linux/x64MinOpts (+0.00% to +0.01%)
Details here |
Part of #94549. Implements the following encodings:
cstool output:
JitDisasm output:
cc @dotnet/arm64-contrib