Support frozen struct returns for Swift calls#99704
Support frozen struct returns for Swift calls#99704jakobbotsch merged 20 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
|
cc @dotnet/jit-contrib PTAL @amanasifkhalid (JIT parts) @jkoritzinsky @jkotas (VM thunk changes) I tested with 5000 tests both locally and in CI and they passed on both osx-arm64 and osx-x64. I reduced the set to 100 tests to be checked in. |
| .macro TAILJMP_R10 | ||
| .byte 0x49 | ||
| .byte 0xFF | ||
| .byte 0xE2 | ||
| .endm |
There was a problem hiding this comment.
This sequence has a superfluous rex prefix bit set to make sure the unwinder picks it up as an external tailcall:
runtime/src/coreclr/vm/amd64/excepamd64.cpp
Lines 368 to 378 in 2193467
(I wasn't totally sure whether it's necessary or not for this thunk, but the existing TAILJMP_RAX also has a superfluous rex prefix)
There was a problem hiding this comment.
This is Windows unwinder convention. I do not think it is applicable outside Windows.
There was a problem hiding this comment.
I'll replace it with just jmp r10. I guess the TAILJMP_RAX uses in other non-Windows thunks can be cleaned up then.
Is there a difference between managed and native code here? The JIT is also generating these tailcalls with the extraneous rex prefixes. Can they be avoided outside Windows?
There was a problem hiding this comment.
Regular CoreCLR uses Windows unwinder outside Windows, so the JIT has to follow the Windows unwinder conventions there as well.
If we cared enough, we can skip the prefix for native AOT. We use native unwinder for native AOT so Windows unwinder conventions are not applicable outside Windows.
amanasifkhalid
left a comment
There was a problem hiding this comment.
JIT changes LGTM. Thanks!
| assert((idx < ArrLen(swiftIntReturnRegs)) && (idx < ArrLen(swiftFloatReturnRegs))); | ||
| unsigned intRegIdx = 0; | ||
| unsigned floatRegIdx = 0; | ||
| for (unsigned i = 0; i < idx; i++) |
There was a problem hiding this comment.
Since this isn't on a hot path, and we have at most 4 return registers for a Swift call, I'm guessing it's not worth caching the return registers used so we can skip this loop? Though we do call GetABIReturnReg while looping over all return register indices in a few places...
There was a problem hiding this comment.
Ideally we would keep all the registers and also the offsets as part of this type, from a simple code hygiene standpoint, but increasing its size increases the size of GenTreeCall which increases the size of all large nodes. I think it would be good to clean this up to make the additional data for ReturnTypeDesc allocated only when necessary, but that's a separate change.
I don't think there's anything to be concerned about wrt. throughput here.
| { | ||
| const CORINFO_SWIFT_LOWERING* lowering = comp->GetSwiftLowering(clsHnd); | ||
| assert(!lowering->byReference); | ||
| assert(lowering->numLoweredElements <= MAX_RET_REG_COUNT); |
There was a problem hiding this comment.
MAX_RET_REG_COUNT is already 4 on ARM64, but it might be helpful to the reader to static_assert that here.
There was a problem hiding this comment.
I added a static assert.
|
I wonder if we have a problem around Edit: I guess not since that helper call is just going to use the regular managed calling convention. |
|
/azp run runtime-coreclr jitstress, runtime-coreclr jitstressregs, runtime-coreclr jitstress2-jitstressregs |
|
Azure Pipelines successfully started running 3 pipeline(s). |
The current design of CORINFO_HELP_PINVOKE_CALLI and vasig cookies is suboptimal. It leaves perf on the table and it is unfriendly to native AOT. We use a different approach in native AOT (look for If you are running into troubles with |
I'll keep that in mind. I don't think we're going to see issues around it, however. |
Adds support for pinvokes to Swift functions that return frozen structs in multiple registers. This turned out to be simpler than I expected on the JIT side; there is a small change necessary in
genMultiRegStoreToLocalto take into account that the Swift fields are going into offsets that don't necessarily correspond to the register sizes (we already DNER the cases where things don't work out, it seems).Also adds 100 tests.
The support is complicated by the fact that Swift calls take the ret buffer in
raxon x64. This requires some VM side changes to avoid usingraxin the interop thunks.