JIT: Intrinsify UTF16->UTF8 conversion for string literals (Encoding.UTF8.TryGetBytes)#85328
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsCloses #84659 Benchmark:byte[] _outputBuffer = new byte[1024];
[Benchmark]
public int TryGetBytes_2B()
{
Encoding.UTF8.TryGetBytes("Hi", _outputBuffer.AsSpan(), out int written);
return written;
}
[Benchmark]
public int TryGetBytes_27B()
{
Encoding.UTF8.TryGetBytes("MemoryMarshal.GetReference", _outputBuffer.AsSpan(), out int written);
return written;
}
[Benchmark]
public int TryGetBytes_122B()
{
Encoding.UTF8.TryGetBytes(
".NET Runtime uses third-party libraries or other resources" +
"that may be distributed under licenses different than the .NET",
_outputBuffer.AsSpan(), out int written);
return written;
}
With codegen diff for ;; length check
vmovups zmm0, zmmword ptr [reloc @RWD00]
vmovups zmmword ptr [r9], zmm0
vmovups zmm0, zmmword ptr [reloc @RWD64]
vmovups zmmword ptr [r9+38H], zmm0
mov eax, 120 ; bytes written
RWD00 dq 6E75522054454E2Eh, 65737520656D6974h, 2D64726968742073h, 696C207974726170h, 2073656972617262h, 726568746F20726Fh, 6372756F73657220h, 6D20746168747365h
RWD64 dq 6D20746168747365h, 6964206562207961h, 6574756269727473h, 207265646E752064h, 7365736E6563696Ch, 6572656666696420h, 206E61687420746Eh, 54454E2E20656874hin this case - via two avx-512 loads from the data section. For now it needs DynamicPGO being enabled to workaround #85209, I'll land a fix for separately (or a workaround). tldr: None of cc @stephentoub TODO:
|
|
@jakobbotsch @SingleAccretion yet another I recommend to enable "ignore whitespaces" mode. There were a couple of "refactorings" + a pretty simple expansion for |
|
@EgorBo Can you merge and rerun CI? |
Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
jakobbotsch
left a comment
There was a problem hiding this comment.
The JIT changes look good to me.
@stephentoub It's tricky to implement but it should be doable. Calling this static method directly from interpolation should be a better solution because we won't have to rely on inlining just like Jan pointed out earlier, so I guess for now I'll remove its usage from |
…time-1 into intrinsify-utf16-utf8-literal
Ok, sounds good. |
|
/azp run runtime-coreclr pgo |
|
No commit pushedDate could be found for PR 85328 in repo dotnet/runtime |
|
@EgorBo, anything I can do to help land this? |
I was thinking of filing a separate PR to improve |
Yes, I thought that's what we agreed on above, using it from AppendLiteral and not from TryGetBytes yet. |
|
Merging then, I've verified that the API works as expected when it's used inside |
|
I'll submit a pr. Thanks. |
| uint16_t ch = bufferU16[charIndex]; | ||
| if (ch > 127) | ||
| { | ||
| // Only ASCII is supported. |
Closes #84659
The main goal is to make upcoming UTF8 string interpolation faster (to be on par with UTF16 one). If JIT sees that the source is a string literal (can be extended to cover RVA
ROS<char>, "shifted" pointers to literal literals,static readonly string) and the data in it is ASCII only - it peforms the conversion in the JIT time and just copies it to the destination.Benchmark:
cc @stephentoub