JIT: Mark some intrinsics as must-expand for NAOT due to ILScanner dependencies by jakobbotsch · Pull Request #109609 · dotnet/runtime

jakobbotsch · 2024-11-07T10:46:18Z

These two intrinsics are treated specially by ILScanner, and thus must always be expanded for NAOT, even in MinOpts.

…ncies

dotnet-policy-service · 2024-11-07T10:46:51Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

jakobbotsch · 2024-11-07T10:58:02Z

src/coreclr/jit/importercalls.cpp

@@ -3397,6 +3397,13 @@ GenTree* Compiler::impIntrinsic(CORINFO_CLASS_HANDLE    clsHnd,
                betterToExpand = true;
                break;


It's not totally clear to me whether we want these to be mustExpand or not, given the "If the intrinsic cannot possibly be expanded, it's fine" in the comment above... usually that would mean that from the JIT's perspective, we would not want to do the expansion in MinOpts.

(In the end there is no practical difference in behavior... today we expand betterToExpand intrinsics even in MinOpts due to a check below, but that sort of goes against our general MinOpts philosophy, I think.)

RuntimeHelpers_CreateSpan and RuntimeHelpers_InitializeArray are pattern matched. If they're not part of the specific IL patter, RyuJIT cannot possibly expand them - that's why it's not mandatory, just "whenever it's in the right shape". We don't want e.g. RuntimeHelpers.InitializeArray(null, default); to fail to compile. But if these intrinsics are not expanded by RyuJIT their method bodies just throw PNSE, we don't have an implementation, so we want to expand everything humanely possible.

The rest are basically mustExpand, we don't have meaningful bodies for them. Could achieve the same thing by making them recursive? It has been a long time since I was looking at this and at that time we didn't have mustExpand/betterToExpand distinction or the recursive special handling.

Could achieve the same thing by making them recursive?

Good point. GetArrayDataReference is defined recursively

runtime/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/InteropServices/MemoryMarshal.NativeAot.cs

Lines 24 to 25 in bad7318

public static ref T GetArrayDataReference<T>(T[] array) =>

ref GetArrayDataReference(array);

. Why is it not marked as mustExpand by https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/importercalls.cpp#L3073-L3074 ?

Ah ok, this only handles the actual recursive call inside the method. Ignore my comment.

mustExpand exists to flag cases that are required to be handled for correctness reasons. This should largely only apply to recursive calls (to avoid the stack overflow). In general it should never be incorrect to leave a call in because things like reflection, indirect invocation (delegates or function pointers), debugger evaluation, or other cases need to work.

betterToExpand then exists to flag cases where it is profitable to do the expansion in MinOpts. This is primarily relevant to things that expand to trivial constant and which are likely to allow dead-code elimination or other simple transforms we do for similar reasons. Such cases most typically improve throughput of the JIT and are highly unlikely to cause bugs.

We notably also treat some other cases, like hardware intrinsics, as functionally being betterToExpand (but we handle them explicitly early on, not via this local). We do it for these because they are known to be typically perf critical, are likely in methods that will be hot and be tiered, compile down to (most typically) singular instructions, generally improve JIT throughput/reduce JIT memory overhead as the hwintrinsic node is much cheaper than a call node (smaller, needs less handling in typical cases, doesn't require ABI fixups, doesn't need additional spills or copies to be introduced, etc), and because the reduce the overall overhead of the app; that is we don't free code for methods that were compiled to T0 and later rejitted; the overhead of function alignment + vzeroupper + instruction + ret for dozens to hundreds of intrinsics is non-trivial, as is the overhead required to meet ABI requirements per call (typically involving spilling the inputs so they can be passed by ref, allocating a return buffer, handling register save/restore requirements for V256/V512 registers, etc), and there often being many tightly coupled such calls in any method using them.

In general it should never be incorrect to leave a call in because things like reflection

It is required for correctness in this specific case. IL scanner assumes that these specific methods are always going to be expanded to save binary size related to generic dictionaries. These methods are very popular, so the size is disproportionate compared to what these methods do. If we were to remove the hardcoded assumption from IL scanner, we would take a binary size hit for native AOT.

This will be cleaner if/once IL Scanner is replaced by actual JIT importer.

betterToExpand then exists to flag cases where it is profitable to do the expansion in MinOpts. This is primarily relevant to things that expand to trivial constant and which are likely to allow dead-code elimination or other simple transforms we do for similar reasons. Such cases most typically improve throughput of the JIT and are highly unlikely to cause bugs.

For me there is a distinction between tier0 and MinOpts for these things. Tier0 implies MinOpts, and for Tier0 we want to do the cheap and profitable optimizations when possible, but for MinOpts without Tier0 my philosophy is that we truly do not want to do any transformations if we can avoid it (generally for diagnostic/debugging/testing reasons).

The rest are basically mustExpand, we don't have meaningful bodies for them. Could achieve the same thing by making them recursive?

I moved the rest of them to the mustExpand section as well. As pointed out above the recursiveness only matters when compiling the intrinsic method itself, not for callers of it.

but for MinOpts without Tier0 my philosophy is that we truly do not want to do any transformations if we can avoid it (generally for diagnostic/debugging/testing reasons).

I generally agree, which is also why I had the annotation above:

In general it should never be incorrect to leave a call in because things like reflection, indirect invocation (delegates or function pointers), debugger evaluation, or other cases need to work.

In this case it sounds like its special and we have to expand it for NAOT for other reasons as well, so I don't think there's much we can do.

I don't think we'd have a problem with ignoring betterToExpand for true MinOpts (where we're falling back from some compilation failure or similar and just trying to emit functioning code). But, there is an incredibly high cost to this for methods using platform specific hardware intrinsics. Things like the ability to step into a method in the debugger similarly don't matter for them, as they are purely recursive calls, so it might be worth considering such cases to be functionally equivalent to IL opcodes and expanded anyways (and if IL were designed today, its very possible that SIMD would have a more direct representation).

jakobbotsch · 2024-11-07T11:38:19Z

@MichalStrehovsky I'm assuming this test failure is harmless, and not indicative of a bug?

DeadCodeElimination+TestUnmodifiableInstanceFieldOptimization+WithInitOnlyPropertyWrite
System.Exception: Exception of type 'System.Exception' was thrown.
   at DeadCodeElimination.ThrowIfPresentWithUsableMethodTable(Type, String) + 0x64
   at DeadCodeElimination.TestTypeOfCodegenBranchElimination.Run() + 0x100
   at DeadCodeElimination.Run() + 0x58
   at TrimmingBehaviors!<BaseAddress>+0x16ec38
   at Program.<<Main>$>g__RunTest|0_0(Func`1, String) + 0x5c
===== Test DeadCodeElimination.Run failed =====

Similarly it looks like there's some failures in size-validating tests, which is to be expected with the forced MinOpts.

MichalStrehovsky · 2024-11-07T12:07:07Z

@MichalStrehovsky I'm assuming this test failure is harmless, and not indicative of a bug?

Yes, this test is testing that size optimizations kick in and those rely on RyuJIT doing stuff.

jkotas

LGTM (with forced minOpts deleted)

MichalPetryka · 2024-11-07T19:19:54Z

Is this safe for the other overload which is not always expanded?

jkotas · 2024-11-07T20:21:15Z

Is this safe for the other overload which is not always expanded?

Good point. I think it will work fine, but it should not be strictly required to mustExpand the non-generic variant.

This reverts commit 6db6378.

jakobbotsch · 2024-11-11T13:01:47Z

Looked through the failures in smoketests and saw only the ones we discussed above. I'll try a run of nativeaot-outerloop in #103950 once this one is merged.

…pendencies (dotnet#109609) These two intrinsics are treated specially by ILScanner, and thus must always be expanded for NAOT, even in MinOpts.

jakobbotsch added 2 commits November 7, 2024 11:45

Mark some intrinsics as must-expand for NAOT due to ILScanner depende…

125778b

…ncies

Force MinOpts for NAOT

6db6378

ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 7, 2024

dotnet-policy-service bot assigned jakobbotsch Nov 7, 2024

Fix comment

6cd7f25

jakobbotsch commented Nov 7, 2024

View reviewed changes

jkotas approved these changes Nov 7, 2024

View reviewed changes

jakobbotsch added 2 commits November 8, 2024 11:30

Only mustExpand generic version of GetArrayDataReference

68f0670

Move a few other intrinsics under the mustExpand path

1a6f2a9

jkotas approved these changes Nov 8, 2024

View reviewed changes

Revert "Force MinOpts for NAOT"

87854bb

This reverts commit 6db6378.

jakobbotsch marked this pull request as ready for review November 11, 2024 16:11

jakobbotsch merged commit 075b42d into dotnet:main Nov 11, 2024

jakobbotsch deleted the mark-intrinsics-mustexpand-ilscanner branch November 11, 2024 16:11

github-actions bot locked and limited conversation to collaborators Dec 12, 2024

		@@ -3397,6 +3397,13 @@ GenTree* Compiler::impIntrinsic(CORINFO_CLASS_HANDLE clsHnd,
		betterToExpand = true;
		break;

	public static ref T GetArrayDataReference<T>(T[] array) =>
	ref GetArrayDataReference(array);

Comments

Conversation

jakobbotsch commented Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service bot commented Nov 7, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakobbotsch commented Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MichalStrehovsky commented Nov 7, 2024

Uh oh!

jkotas left a comment

Choose a reason for hiding this comment

Uh oh!

MichalPetryka commented Nov 7, 2024

Uh oh!

jkotas commented Nov 7, 2024

Uh oh!

jakobbotsch commented Nov 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jakobbotsch commented Nov 7, 2024 •

edited

Loading

jakobbotsch commented Nov 7, 2024 •

edited

Loading