Add WidenLatin1ToUtf16_MisalignedAddress#90319
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-text-encoding Issue DetailsHandle the case where the The current code uses of
|
| if (((nuint)pUtf16Buffer & 1) != 0) | ||
| { | ||
| // Input isn't char aligned, we won't be able to vectorize. | ||
| WidenLatin1ToUtf16_MisalignedAddress(pLatin1Buffer, pUtf16Buffer, elementCount); |
There was a problem hiding this comment.
I am not sure I understand the need of this, I am seeing that we already align data in
runtime/src/libraries/System.Private.CoreLib/src/System/Text/Latin1Utility.cs
Lines 990 to 1015 in 4218f4c
There was a problem hiding this comment.
If the address that pUtf16Buffer points to is naturally aligned, then this logic should work as expected.
The problem is:
char* pCurrentWriteAddress = pUtf16Buffer + currentOffset;which does not effectively ensure correct alignment for a misaligned char*.
There was a problem hiding this comment.
@EgorBo As an alternative approach, change the following to Sse2.Store and accept the performance hit pre-Nehalem.
There was a problem hiding this comment.
We should just use unaligned load/store here. It may have some small hit on hardware that's more than 10-12 years old, but it simplifies things overall and will be increasingly rare to encounter such hardware over time. Plus, we don't use aligned loads/stores in most of our workloads already, so this won't be any different.
Opportunistically aligning the data is still goodness and will avoid cache line splits.
There was a problem hiding this comment.
It looks like Sse2.Store approach is better since unaligned char* should be rare? (e.g. char[] is always aligned)
There was a problem hiding this comment.
The consideration would be that Sse2.Store doesn't allow containment on pre-AVX hardware and on pre-AVX hardware movups was frequently slower than movaps even if the data was definitely aligned.
But, given the age of the hardware involved and the logic we use elsewhere, it shouldn't be a huge consideration (and the cost also wasn't huge, typically an extra cycle or so; the split cache line accesses were much worse and closer to 20 cycles for loads).
There was a problem hiding this comment.
Plus, we don't use aligned loads/stores in most of our workloads already, so this won't be any different.
WidenLatin1ToUtf16_Sse2 and NarrowUtf16ToLatin1_Sse2 are the only methods in the repository that use StoreAligned.
There was a problem hiding this comment.
It looks like
Sse2.Storeapproach is better since unaligned char* should be rare? (e.g. char[] is always aligned)
A potential case is the public API System.Text.Encoding.GetChars(System.Byte*, System.Int32, System.Char*, System.Int32).
There was a problem hiding this comment.
WidenLatin1ToUtf16_Sse2 and NarrowUtf16ToLatin1_Sse2 are the only methods in the repository that use StoreAligned.
Right, and so preserving the alignment here won't make a big difference to downlevel hardware. We should just normalize to not using StoreAligned so the code can be simpler and robust in the face of unalignable data.
adamsitnik
left a comment
There was a problem hiding this comment.
We should just normalize to not using
StoreAlignedso the code can be simpler and robust in the face of unalignable data.
I agree with @tannergooding . @xtqqczze please apply the suggested change
@xtqqczze thank you for your contribution!
|
This pull request has been automatically marked |
|
This pull request will now be closed since it had been marked |
Handle the case where the
pUtf16Bufferparameter is misaligned, and fallback to a simple loop.The current code uses of
Sse2.StoreAligned(MOVDQA), which is documented to throw a general-protection exception if the memory operand is misaligned. However, it is possible data could be silently corrupted if the existing logic does not account for misaligned address.