Skip to content

Consider doing "naive" search in IndexOfAny if string's length is under a certain threshold #6586

@jamesqo

Description

@jamesqo

IndexOfAny currently uses a trick where it initializes a bitmap to represent if a character is probably in the array. This makes checking whether a character is represented by the bitmap O(1) since it's just a couple of bit operations, however initializing the map is quite expensive since it's O(n) and these bit operations have to be done for every character of the array. This means that in some cases, the "naive" approach where we make a pass through the array for every character can be faster for small strings (i.e. if anyOf.Length == 2, the threshold seems to be at about 9).

We should consider applying the following optimizations:

  • Checking if the string's length is below a certain threshold depending on the length of anyOf, and if so, use the "naive" approach.
    • This could be implemented as a lookup table, i.e. static int32_t thresholds[] = { 9, 7, ... } for minimal overhead.
  • Checking if anyOf.Length == 1 and if so return IndexOf(anyOf[0])
  • Checking if this.Length == 1 and if so return Array.IndexOf(anyOf, this[0])
  • Additionally if anyOf.Length == 2 and the difference between the two characters is a power of 2, we may be able to apply this optimization where instead of searching the array or initializing the bitmap, we simply OR each char with anyOf[0] ^ anyOf[1] (which will have exactly one bit set) and check if it's equal to anyOf[0] | anyOf[1] (which will be one of the 2 elements).
    • The drawback of this, of course, is that it would only apply to arrays of length 2.

Related issues:

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-Metahelp wanted[up-for-grabs] Good issue for external contributorstenet-performancePerformance related issue

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions