Faster optimized frozen dictionary creation (4/n)#87876
Faster optimized frozen dictionary creation (4/n)#87876stephentoub merged 2 commits intodotnet:mainfrom
Conversation
we rent a single dimension array, where every bucket has five slots. The bucket starts at (key.Length - minLength) * 5. Each value is an index of the key from _keys array or just -1, which represents "null". Creation time is from 2 to 10 times faster Indexing is 4-10% slower, but still and order of magnitude faster than Dictionary
|
Tagging subscribers to this area: @dotnet/area-system-collections Issue DetailsInstead of creating a dictionary of lists and a multi-dimensional array we rent a single dimension array, where every bucket has five slots. Creation time is from x2 to x10 times faster, not more than 2x slower compared to Dictionary.
|
| // If there would be too much empty space in the lookup array, bail. | ||
| if (groupedByLength.Count / (double)spread < EmptyLengthsRatio) | ||
| { | ||
| // This size check prevents OOM. |
There was a problem hiding this comment.
Not really :) It just prevents artificial OOMs from arrays of unsupported sizes. You could still try to rent an array of length Array.MaxLength - 1 and OOM.
| int arraySize = spread * MaxPerLength; | ||
| #if NET6_0_OR_GREATER | ||
| List<KeyValuePair<string, TValue>> list = CollectionsMarshal.GetValueRefOrAddDefault(groupedByLength, s.Length, out _) ??= new List<KeyValuePair<string, TValue>>(MaxPerLength); | ||
| if (arraySize >= Array.MaxLength) |
| internal sealed class LengthBucketsFrozenDictionary<TValue> : FrozenDictionary<string, TValue> | ||
| { | ||
| /// <summary>Allowed ratio between buckets with values and total buckets. Under this ratio, this implementation won't be used due to too much wasted space.</summary> | ||
| private const double EmptyLengthsRatio = 0.2; |
There was a problem hiding this comment.
I don't understand why it's ok to do away with this concept. If I have one string of length 1_000_000 and one string of length 1, won't I now end up allocating an array of 5 million elements?
There was a problem hiding this comment.
If I have one string of length 1_000_000 and one string of length 1, won't I now end up allocating an array of 5 million elements?
This case is the first thing we check and I've left it untouched:
I am going to benchmark whether it's beneficial to remove this check or not and get back to you.
There was a problem hiding this comment.
This case is the first thing we check and I've left it untouched:
No, that's different. The check that's there is seeing whether it can be easily predetermined whether any bucket might have too many elements in it. In contrast, the EmptyLengthsRatio you deleted was used to determine whether the lengths were too spread out such that we'd be wasting a ton of space with holes in the array.
There was a problem hiding this comment.
I am going to benchmark whether it's beneficial to remove this check or not and get back to you.
For MaxPerLength you mean? For a given bucket, the implementation is doing a linear scan of every element. If 1000 strings were to end up in the same bucket and every look up involved comparing up to 1000 strings, it would absolutely be worse than an O(1) dictionary lookup. So there is going to be some threshold where we no longer want to use this strategy. It's certainly possible, especially with the other changes, that MaxPerLength can be increased, but it can't become infinite.
| if (buckets[index] < 0) buckets[index] = i; | ||
| else if (buckets[index + 1] < 0) buckets[index + 1] = i; | ||
| else if (buckets[index + 2] < 0) buckets[index + 2] = i; | ||
| else if (buckets[index + 3] < 0) buckets[index + 3] = i; | ||
| else if (buckets[index + 4] < 0) buckets[index + 4] = i; |
There was a problem hiding this comment.
It's brittle to manually unroll this loop that's tied to the MaxPerLength constant (e.g. if the constant were increased or decreased, this unrolling would be wrong). Can we just make this a loop?
| // AllocateUninitializedArray is slower for small inputs, hence the size check. | ||
| int[] copy = arraySize < 1000 | ||
| ? new int[arraySize] | ||
| : GC.AllocateUninitializedArray<int>(arraySize); |
There was a problem hiding this comment.
The first thing GC.AllocateUninitializedArray<int> does is a length < 512 check. Can we just let it do its thing rather than also checking here?
....Collections.Immutable/src/System/Collections/Frozen/String/LengthBucketsFrozenDictionary.cs
Show resolved
Hide resolved
|
Cc @geeknoid in case he's interested in this series of PRs |
|
Updated perf numbers: Creation time is from x2 to x16 times faster, not more than 2x slower compared to Dictionary.
|
Instead of creating a dictionary of lists and a multi-dimensional array we rent a single dimension array, where every bucket has five slots.
The bucket starts at
(key.Length - minLength) * 5index of the array.Each value is an index of the key from _keys array or just -1, which represents "null".
We avoid having two copies and re-ordering of keys and values collections.
Creation time is from x2 to x10 times faster, not more than 2x slower compared to Dictionary.
Indexing is 6-12% slower, but for large inputs it's still and order of magnitude faster than Dictionary.