66% speedup for Path.Combine on Windows by jamesqo · Pull Request #11293 · dotnet/corefx

jamesqo · 2016-08-31T14:05:28Z

I ran some iterations of Path.Combine through PerfView recently to see if there was anything that could be optimized. It turns out that this check, which calls IndexOfAny with a gigantic char array, was taking up as much of 80% of CPU time; in fact this method alone, which initializes a "probabilistic map" to represent if a char is in the array, was where the program spent 60% of its time.

Changing the method to use a manual, "naive" implementation of IndexOfAny results in an approximately 3x speedup for Path.Combine, from ~2.3s to 0.7s. You can see the test app and results I made for benchmarking here.

Other note: I also removed an apparently dead optional parameter that's not being used anywhere.

cc @JeremyKuhne, @stephentoub

stephentoub · 2016-08-31T15:31:57Z

Seems reasonable to me. Thanks!

It would also be interesting to understand whether the implementation in coreclr is actually providing an improvement for typical use cases; maybe there's a threshold that should be applied, e.g. the current implementation is used when the length of the string and the length of the char array are greater than some determined thresholds, otherwise a simple double-loop approach is employed.

cc: @jkotas

jkotas · 2016-08-31T15:56:28Z

It would also be interesting to understand whether the implementation in coreclr is actually providing an improvement for typical use cases

This loop will be always faster.

While you are on it, you may also get rid of InvalidPathChars and InvalidFileNameChars statics. They seems to be only used as template to clone from. It would be better to create a fresh copy of the array each time instead of cloning.

stephentoub · 2016-08-31T16:00:19Z

This loop will be always faster.

I agree in this case, especially since most of the elements can be grouped into a single comparison. I was referring more generally to other places where IndexOfAny is used, without a priori knowledge of such groupings. Presumably there is some cross-over point where all of that up-front work done by the current implementation provides value... if yes, and if that value isn't 0, seems like we should put another implementation in place for the cases that suffer under that implementation's weight... and if no, then it seems like we should delete that "optimization" from coreclr entirely.

jamesqo · 2016-08-31T23:47:47Z

@stephentoub I wrote an API proposal that did some study of this after discovering the bottleneck yesterday.

To my understanding, unless a character is found at maybe the first 4 indices in the string, the current implementation will always be faster. The reason is that the "naive" approach makes a pass over the array for each char in the string, so that's O(m * n). With the bitmap approach, it takes 1 pass over the array to initialize the bitmap, and then checking if a character is contained in the bitmap is just a few bit operations which are O(1), much cheaper than making another pass. So the complexity there is closer to O(m + n).

stephentoub · 2016-09-01T00:14:16Z

unless a character is found at maybe the first 4 indices in the string, the current implementation will always be faster

What if, for example, the char[] only has two chars in it?

jamesqo · 2016-09-01T01:12:29Z

@stephentoub I just did some benchmarking over here with arrays/strings of varying lengths. It looks like the cutoff may have been higher than I expected for smaller arrays; for length-2 arrays it seems to be at around index 9, for length 3/4 it's about index 7. I have just opened dotnet/coreclr#7017 to track if we should apply some kind of threshold before which we just use a naive loop.

stephentoub · 2016-09-01T13:57:16Z

you may also get rid of InvalidPathChars and InvalidFileNameChars statics

I'll fix this after this is merged.

JeremyKuhne · 2016-09-01T17:30:10Z

Thanks! I'll look into porting this to desktop.

66% speedup for Path.Combine on Windows Commit migrated from dotnet/corefx@980aa1b

Make PathInternal.HasIllegalCharacters implementation faster on Windows

b39ba74

dnfclas added the cla-already-signed label Aug 31, 2016

dotnet-bot added the 3 - Ready For Review label Aug 31, 2016

stephentoub merged commit 980aa1b into dotnet:master Sep 1, 2016

stephentoub removed the 3 - Ready For Review label Sep 1, 2016

jamesqo deleted the illegal-path-chars branch September 1, 2016 14:01

stephentoub mentioned this pull request Sep 1, 2016

Avoid array cloning in Path.Get*Chars #11338

Merged

JeremyKuhne mentioned this pull request Sep 1, 2016

Windows path tweaks #8669

Merged

karelz modified the milestone: 1.1.0 Dec 3, 2016

jamesqo mentioned this pull request Jan 31, 2020

Consider doing "naive" search in IndexOfAny if string's length is under a certain threshold dotnet/runtime#6586

Closed

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022

Merge pull request dotnet/corefx#11293 from jamesqo/illegal-path-chars

acb49cb

66% speedup for Path.Combine on Windows Commit migrated from dotnet/corefx@980aa1b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

66% speedup for Path.Combine on Windows#11293

66% speedup for Path.Combine on Windows#11293
stephentoub merged 1 commit intodotnet:masterfrom
jamesqo:illegal-path-chars

jamesqo commented Aug 31, 2016 •

edited

Loading

Uh oh!

stephentoub commented Aug 31, 2016

Uh oh!

jkotas commented Aug 31, 2016

Uh oh!

stephentoub commented Aug 31, 2016

Uh oh!

jamesqo commented Aug 31, 2016 •

edited

Loading

Uh oh!

stephentoub commented Sep 1, 2016

Uh oh!

jamesqo commented Sep 1, 2016

Uh oh!

stephentoub commented Sep 1, 2016

Uh oh!

JeremyKuhne commented Sep 1, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

jamesqo commented Aug 31, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stephentoub commented Aug 31, 2016

Uh oh!

jkotas commented Aug 31, 2016

Uh oh!

stephentoub commented Aug 31, 2016

Uh oh!

jamesqo commented Aug 31, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stephentoub commented Sep 1, 2016

Uh oh!

jamesqo commented Sep 1, 2016

Uh oh!

stephentoub commented Sep 1, 2016

Uh oh!

JeremyKuhne commented Sep 1, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

jamesqo commented Aug 31, 2016 •

edited

Loading

jamesqo commented Aug 31, 2016 •

edited

Loading