Rework ProbabilisticMap character checks in SearchValues#101001
Merged
MihaZupan merged 3 commits intodotnet:mainfrom May 16, 2024
Merged
Rework ProbabilisticMap character checks in SearchValues#101001MihaZupan merged 3 commits intodotnet:mainfrom
MihaZupan merged 3 commits intodotnet:mainfrom
Conversation
Contributor
|
Tagging subscribers to this area: @dotnet/area-system-buffers |
Member
|
Are all these well covered in the perf repo? |
This was referenced Apr 13, 2024
Closed
Closed
Member
Author
|
/azp run runtime-libraries-coreclr outerloop |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Member
Author
Yes, mainly by the Results
|
MihaZupan
commented
Apr 23, 2024
src/libraries/System.Private.CoreLib/src/System/SearchValues/ProbabilisticMapState.cs
Show resolved
Hide resolved
This was referenced Apr 23, 2024
a496f8e to
d9f3052
Compare
This was referenced Apr 23, 2024
Closed
stephentoub
approved these changes
May 15, 2024
9a65ad9 to
f196644
Compare
This was referenced May 17, 2024
Closed
Closed
Member
|
Improvements on arm64: dotnet/perf-autofiling-issues#34817 |
Ruihan-Yin
pushed a commit
to Ruihan-Yin/runtime
that referenced
this pull request
May 30, 2024
* Rework ProbabilisticMap character checks in SearchValues * Reduce footprint of ProbMap SearchValues * Update misleading comment
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Contributes to #100315 (comment)
The probabilistic map uses an
O(i * m)fallback for-Exceptmethods (scan all the values for each character in the input).We use the same
O(m)helper (scan the values) to confirm potential matches in the vectorized helper.This PR adds support for the probmap to use an
O(1)contains check for-Exceptmethods and match confirmations.The check is effectively a perfect hash check
entries[value % entries.Length] == value, but implemented using a variant ofFastMod. These checks can be inlined into the vectorized methods while not consuming too much memory.We use this only when the probmap is computed as part of
SearchValuesas finding an optimal modulus is relatively expensive (seeProbabilisticMapState.FindModulus- there are probably smarter ways to go about it). When the probabilistic map is created for single-useIndexOfAnyoperations, we still use the sameO(m)checks as before.This PR also replaces the
Latin1CharSearchValuesimplementation withBitmapCharSearchValues, which can use a bitmap of arbitrary size (not limited to[0, 255]). We use this when we guess that it'll be faster than theProbabilisticMap(e.g. there are a lot of values in the set, or the set is dense and the probmap isn't vectorized).I haven't changed the condition when we use
ProbabilisticWithAsciiCharSearchValuesas there are plausible cases where the ASCII fast path may still be useful even if the probabilistic path could be a bitmap instead.Improvements for early matches (cheaper confirmation step)
Improvements for IndexOfAnyExcept (O(1) instead of O(m) character checks)
Speedup factor is going to depend on how many values are in the set and how early in the
valueseach input character matched with the previous implementation.With the new implementation, the throughput is less dependent on the haystack.
The bitmap vs probmap hash O(1) checks
This happens to bring all
SearchValues<char>implementations to anO(i)worst-case (fromO(i * m)).