-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Closed
Labels
Bottom Up WorkNot part of a theme, epic, or user storyNot part of a theme, epic, or user storyUser StoryA single user-facing feature. Can be grouped under an epic.A single user-facing feature. Can be grouped under an epic.area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issuePerformance related issue
Milestone
Description
The current register allocation selection heuristics are less than optimal. At the very least, they result in poor selection in the face of heavy register pressure, as seen in #8846, and more broadly described in #6824.
The plan for addressing this is:
- Merge the methods that allocate registers from the free and busy sets and refactor to make it easier to reason about and reconfigure the criteria ([LSRA][RyuJIT] Consider merging allocating free & busy regs #9399, in progress). This must be done with minimal impact on throughput, and the initial implementation should preserve existing behavior (i.e. zero diffs) to allow distinguishing impacts of changing heuristics from the changes to the underlying implementation of them. Done in Combine free and busy register allocation #45135
- Refactor the code so that heuristics can be applied in different and dynamic order. Done in Refactor LSRA's heuristics selection #52832
- For experimentation purpose, expose
COMPlus_variable(s) that will change the order in which heuristics will be applied. Done in Refactor LSRA's heuristics selection #52832 - Experiment with various ordering and gather data from Superpmi / PerfScore to determine which ordering is beneficial or which heuristics should stay closer to each other. Done in Lsra heuristic tuning experiment report #56103
- Design a mechanism for specifying alternate heuristic configurations (weights or ordering) via one or more
COMPlusvariables. Done in Refactor LSRA's heuristics selection #52832 - Extract/refactor the heuristic analysis in such a way that the fast-path (existing heuristic) remains fast, but an alternative config-specified path is available. Done in Refactor LSRA's heuristics selection #52832
- Modify the fast path to reflect the auto-tuning best result. This should include short-circuiting the analysis, or otherwise unnecessarily evaluating the criteria for registers already known to be less desirable (JIT LSRA Throughput: Short-circuit register selection #6705)
Future work
- Design and implement an auto-tuning framework, e.g. using SuperPMI with PerfScore as the objective metric. (Opened Auto tune register selection heuristics #55374 to track the progress in Future release).
- Do some initial tuning of the heuristics, minimally addressing [RyuJIT][LSRA] Let variables within a loop use register first #8846.
Additional issues that should be considered and/or addressed: #7999 (heuristics for incoming parameters), #7996 (include encoding size in heuristics, beyond REG_VAR_ORDER)
General register allocation issues that should be analyzed to determine if tuning the heuristics will address them:
- LSRA: Register spilling issue when using AsSpan and Slice causing slow performance #7809
- JIT: register shuffles after the null inlinee gc refs change #7447
category:cq
theme:register-allocator
skill-level:expert
cost:large
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Bottom Up WorkNot part of a theme, epic, or user storyNot part of a theme, epic, or user storyUser StoryA single user-facing feature. Can be grouped under an epic.A single user-facing feature. Can be grouped under an epic.area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issuePerformance related issue
Type
Projects
Status
Done