Vectorise BitArray for ARM64#33749
Conversation
src/libraries/System.Collections/src/System/Collections/BitArray.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Is BinaryPrimitives.ReverseEndianness optimized to bswap equivalent on AArch64 ?
There was a problem hiding this comment.
I do not think so, which is why I left a comment on #33308 about it. It would be REV instructions that correspond to x86's bswap instructions.
|
Is there a way to check which tests specifically have failed & their console outputs? |
|
@Gnbrkm41, its not super intuitive, but once you are at the Azure logs (click You then have to click on the test, goto In this case, the log indicates: |
|
Given that this is a core dump, I'll try to build and run locally to see if I can repro. |
|
Oh, huh. That is interesting... I'll take a look as well. |
|
This actually looks like Mono, I'm not very familiar with how to debug or produce bits for that, do we have a guide somewhere? @EgorBo? |
It crashes with StackOverflow in System.Collections.Tests. |
|
Do the remaining test legs get automatically cancelled when one of the legs fail? I see that all the non-Mono ARM CI runs are cancelled. It would be nice if we can selectively run those... |
Fixes StackOverflow in dotnet/runtime#33749 (comment) Intrinsify all `get_IsSupported` under `System.Runtime.Intrinsics*` to just `false` (except the sets we support, see mono_emit_simd_intrinsics).
@Gnbrkm41 Thank you for your contribution! I will collect jitDisasm and post here. In the meantime, I will also try to run benchmarks - not sure if they are supported on arm64 yet. |
Fixes StackOverflow in dotnet/runtime#33749 (comment) Intrinsify all `get_IsSupported` under `System.Runtime.Intrinsics*` to just `false` (except the sets we support, see mono_emit_simd_intrinsics). Co-authored-by: EgorBo <EgorBo@users.noreply.github.com>
|
I managed to collect jitDisasms for all the methods in PR. I attached the file but the following are the interesting spots: And Or Xor Not CopyTo .ctor Couple notes here:
|
|
@TamarChristinaArm, would you or someone else from ARM be able to take a glance at the PR and codegen and make sure we are using the right tricks? Also CC. @CarolEidt |
@stephentoub should we have an analyzer for that? |
As far as I understand this hack is a trick to allow intrinsics to be used via reflection. |
Yes, its used for basically any form of indirect invocation including: reflection, delegates, the debugger/immediate window, and when non constant inputs are given to an intrinsic that requires them. |
src/libraries/System.Collections/src/System/Collections/BitArray.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Collections/src/System/Collections/BitArray.cs
Outdated
Show resolved
Hide resolved
07d9f31 to
4e3891e
Compare
|
I don't see any Libraries test run for CoreCLR on ARM on the CI list (there's Mono one but I assume that Mono doesn't support HWIntrinsics - #33761). Is it correct to assume that Libraries tests are not run on ARM machines w/ CoreCLR? Surprising if this is the case. |
Taking a look now, will have some feedback by end of today. |
|
Implementation wise these are all correct and fine. So I have comments mostly about the codegen out of the JIT than the actual implementation here. For e.g. this code for should ideally be Which means you don't need the adjustment for for currently generate Seems like could be Also am I missing something? I see With no actual usages of
For the algorithm itself, yeah as @Gnbrkm41 says and you don't need the byte reversals as they would be encoded in the For the constructor Aside from what @echesakovMSFT said that this should be a If it ever actually need a |
|
Thanks a lot for the analysis @TamarChristinaArm As for
The actual implementation is here: https://source.dot.net/#System.Private.CoreLib/Vector128.cs,434, as you may be able to see, we currently have a specialized path for Fixing this is tracked by #33308 and #33496 and should be relatively straightforward given the APIs we have exposed now. |
It appears that is true. @safern @ViktorHofer Is this intended? How can we get this libraries change properly tested on ARM64 (Linux, Windows) hardware? Also, ideally, this should be tested with a Checked CoreCLR also. The ARM64 hardware intrinsics support is new, so we'd prefer to run code through a checked JIT to see if any asserts crop up. |
|
Due to hardware limitations those are not run in PRs but they are run on merge: runtime/eng/pipelines/runtime.yml Lines 669 to 672 in a43d6c4 However, we can just queue a manual build from this branch, which will cause those jobs to run. Which I just did: https://dev.azure.com/dnceng/public/_build/results?buildId=568064 |
|
There were indeed some failures on arm: https://dev.azure.com/dnceng/public/_build/results?buildId=568064&view=ms.vss-test-web.build-test-results-tab&runId=17835402&resultId=134928&paneView=debug |
|
New test run https://dev.azure.com/dnceng/public/_build/results?buildId=572708 shows test failures on Linux arm64 Release with https://helix.dot.net/api/2019-06-17/jobs/1e9376b5-c757-4eae-aa25-a887a41cd095/workitems/System.Collections.Tests/console I will run this locally to see what's going on |
|
I have been debugging this for awhile - turned out this failure was Linux only. I didn't see the failure yesterday since I was running benchmarks on Windows/arm64 laptop and we didn't see this in CI before since your branch was based on top of an older commit (before #33936). I submitted a PR to fix the issue #34107 - the fix should be merged before this change can go in. |
|
Yes, @Gnbrkm41 please do as Bruce suggested one more time (I hope this one is final) |
9738573 to
c97520d
Compare
|
@BruceForstall, @echesakovMSFT - Done 🙂 |
|
I manually triggered https://dev.azure.com/dnceng/public/_build/results?buildId=577868&view=results |
|
@tannergooding @echesakovMSFT @GrabYourPitchforks All the tests have passed (except a first-pass set of flaky failures that are still visible?). Anyone want to do a final code review and sign-off so this can be merged? |
| // Same logic as SSSE3 path, except we do not have Shuffle instruction. | ||
| // (TableVectorLookup could be an alternative - dotnet/runtime#1277) |
There was a problem hiding this comment.
I'd like a tracking issue to be logged for this, so we don't forget to replace the implementation once TableVectorLookup is implemented.
There was a problem hiding this comment.
FWIW, this should be addressed by #38780.
echesakov
left a comment
There was a problem hiding this comment.
Looks Good. Thank you!
|
Thanks @Gnbrkm41 |
|
@echesakovMSFT, Has there been any improvements regarding "cmeq (register) when one of the operands is Vector128.Zero will be replaced with cmeq (zero) when we support this optimization"? |
@Gnbrkm41 No, I haven't had time to work on this yet, but I have this on my backlog. |

Resolves #33309
Contributes towards #33308 (All the work items under BitArray)
Due to network issues I have with the home network I was not run able to run tests / benchmarks on my only ARM machine (Raspberry Pi). In the meantime, hopefully the CI would be able to verify if anything is wrong with my code.
cc @BruceForstall, @echesakovMSFT, @tannergooding