gh-117431: Optimize str.startswith#117480
Conversation
|
Could you add the other optimisations as separate commits? |
|
The difference between 27.2 and 26.3 ns is too small and can be the result of unrelated factors. I get a nanosecond variation when run the same command several times. |
Yes, so I'm curious about the other two mentioned optimisations that are not (yet) part of this PR. Perhaps they have a greater impact. |
I created a PR with the other approach: #117782. |
|
Closing this in favor of the alternate PR. |
We apply two optimizations in
tailmatch(which is used in bothstr.startswithandstr.endswith).memcmpas all characters have already been checkedmemcmpwe can reduce the number of bytes compared since the first and last character have already been checked.Notes:
Two possible optimizations not included in this PR:
PyUnicode_READ(kind_self, data_self, offset) == PyUnicode_READ(kind_sub, data_sub, 0)andPyUnicode_READ(kind_self, data_self, offset + end_sub) == PyUnicode_READ(kind_sub, data_sub, end_sub)are equal in that case. We can eliminate that by adding something likeThis makes the code for the single character case a bit faster, but the code a bit more complex.
Benchmark (on top of #117466):
python -m timeit -s "s = 'abcdef'" "s.startswith('a')"