Skip to content

Add SIMD impl of memchr2 for LoongArch#551

Merged
lhecker merged 1 commit intomicrosoft:mainfrom
heiher:loong-simd-memchr2
Jul 2, 2025
Merged

Add SIMD impl of memchr2 for LoongArch#551
lhecker merged 1 commit intomicrosoft:mainfrom
heiher:loong-simd-memchr2

Conversation

@heiher
Copy link
Contributor

@heiher heiher commented Jul 1, 2025

Benchmark results on LA664:

  • LASX
simd/memchr2/8          time:   [7.6037 ns 7.6060 ns 7.6094 ns]
                        thrpt:  [1.1015 GiB/s 1.1020 GiB/s 1.1023 GiB/s]
                 change:
                        time:   [−5.0872% −5.0162% −4.9421%] (p = 0.00 < 0.05)
                        thrpt:  [+5.1990% +5.2812% +5.3599%]
                        Performance has improved.

simd/memchr2/136        time:   [10.450 ns 10.460 ns 10.470 ns]
                        thrpt:  [12.186 GiB/s 12.198 GiB/s 12.209 GiB/s]
                 change:
                        time:   [−90.586% −90.570% −90.555%] (p = 0.00 < 0.05)
                        thrpt:  [+958.73% +960.49% +962.21%]
                        Performance has improved.

simd/memchr2/1024       time:   [44.428 ns 44.456 ns 44.492 ns]
                        thrpt:  [21.456 GiB/s 21.473 GiB/s 21.487 GiB/s]
                 change:
                        time:   [−94.643% −94.633% −94.625%] (p = 0.00 < 0.05)
                        thrpt:  [+1760.5% +1763.2% +1766.7%]
                        Performance has improved.

simd/memchr2/131072     time:   [4.9362 µs 4.9389 µs 4.9425 µs]
                        thrpt:  [24.698 GiB/s 24.716 GiB/s 24.730 GiB/s]
                 change:
                        time:   [−95.301% −95.296% −95.292%] (p = 0.00 < 0.05)
                        thrpt:  [+2024.1% +2025.9% +2027.9%]
                        Performance has improved.

simd/memchr2/134217728  time:   [5.7547 ms 5.7604 ms 5.7662 ms]
                        thrpt:  [21.678 GiB/s 21.700 GiB/s 21.721 GiB/s]
                 change:
                        time:   [−94.648% −94.642% −94.637%] (p = 0.00 < 0.05)
                        thrpt:  [+1764.7% +1766.5% +1768.5%]
                        Performance has improved.
  • LSX
simd/memchr2/8          time:   [3.6022 ns 3.6041 ns 3.6066 ns]
                        thrpt:  [2.3241 GiB/s 2.3257 GiB/s 2.3269 GiB/s]
                 change:
                        time:   [−55.037% −55.002% −54.966%] (p = 0.00 < 0.05)
                        thrpt:  [+122.05% +122.23% +122.41%]
                        Performance has improved.

simd/memchr2/136        time:   [10.407 ns 10.414 ns 10.422 ns]
                        thrpt:  [12.243 GiB/s 12.252 GiB/s 12.260 GiB/s]
                 change:
                        time:   [−90.624% −90.609% −90.592%] (p = 0.00 < 0.05)
                        thrpt:  [+962.91% +964.87% +966.60%]
                        Performance has improved.

simd/memchr2/1024       time:   [55.633 ns 55.662 ns 55.702 ns]
                        thrpt:  [17.138 GiB/s 17.150 GiB/s 17.159 GiB/s]
                 change:
                        time:   [−93.292% −93.281% −93.272%] (p = 0.00 < 0.05)
                        thrpt:  [+1386.4% +1388.2% +1390.9%]
                        Performance has improved.

simd/memchr2/131072     time:   [6.5801 µs 6.5845 µs 6.5899 µs]
                        thrpt:  [18.524 GiB/s 18.539 GiB/s 18.551 GiB/s]
                 change:
                        time:   [−93.736% −93.729% −93.723%] (p = 0.00 < 0.05)
                        thrpt:  [+1493.2% +1494.7% +1496.4%]
                        Performance has improved.

simd/memchr2/134217728  time:   [7.1782 ms 7.1828 ms 7.1887 ms]
                        thrpt:  [17.388 GiB/s 17.403 GiB/s 17.414 GiB/s]
                 change:
                        time:   [−93.324% −93.319% −93.314%] (p = 0.00 < 0.05)
                        thrpt:  [+1395.6% +1396.9% +1398.0%]
                        Performance has improved.

Benchmark results on LA664:

- LASX

```
simd/memchr2/8          time:   [7.6037 ns 7.6060 ns 7.6094 ns]
                        thrpt:  [1.1015 GiB/s 1.1020 GiB/s 1.1023 GiB/s]
                 change:
                        time:   [−5.0872% −5.0162% −4.9421%] (p = 0.00 < 0.05)
                        thrpt:  [+5.1990% +5.2812% +5.3599%]
                        Performance has improved.

simd/memchr2/136        time:   [10.450 ns 10.460 ns 10.470 ns]
                        thrpt:  [12.186 GiB/s 12.198 GiB/s 12.209 GiB/s]
                 change:
                        time:   [−90.586% −90.570% −90.555%] (p = 0.00 < 0.05)
                        thrpt:  [+958.73% +960.49% +962.21%]
                        Performance has improved.

simd/memchr2/1024       time:   [44.428 ns 44.456 ns 44.492 ns]
                        thrpt:  [21.456 GiB/s 21.473 GiB/s 21.487 GiB/s]
                 change:
                        time:   [−94.643% −94.633% −94.625%] (p = 0.00 < 0.05)
                        thrpt:  [+1760.5% +1763.2% +1766.7%]
                        Performance has improved.

simd/memchr2/131072     time:   [4.9362 µs 4.9389 µs 4.9425 µs]
                        thrpt:  [24.698 GiB/s 24.716 GiB/s 24.730 GiB/s]
                 change:
                        time:   [−95.301% −95.296% −95.292%] (p = 0.00 < 0.05)
                        thrpt:  [+2024.1% +2025.9% +2027.9%]
                        Performance has improved.

simd/memchr2/134217728  time:   [5.7547 ms 5.7604 ms 5.7662 ms]
                        thrpt:  [21.678 GiB/s 21.700 GiB/s 21.721 GiB/s]
                 change:
                        time:   [−94.648% −94.642% −94.637%] (p = 0.00 < 0.05)
                        thrpt:  [+1764.7% +1766.5% +1768.5%]
                        Performance has improved.
```

- LSX

```
simd/memchr2/8          time:   [3.6022 ns 3.6041 ns 3.6066 ns]
                        thrpt:  [2.3241 GiB/s 2.3257 GiB/s 2.3269 GiB/s]
                 change:
                        time:   [−55.037% −55.002% −54.966%] (p = 0.00 < 0.05)
                        thrpt:  [+122.05% +122.23% +122.41%]
                        Performance has improved.

simd/memchr2/136        time:   [10.407 ns 10.414 ns 10.422 ns]
                        thrpt:  [12.243 GiB/s 12.252 GiB/s 12.260 GiB/s]
                 change:
                        time:   [−90.624% −90.609% −90.592%] (p = 0.00 < 0.05)
                        thrpt:  [+962.91% +964.87% +966.60%]
                        Performance has improved.

simd/memchr2/1024       time:   [55.633 ns 55.662 ns 55.702 ns]
                        thrpt:  [17.138 GiB/s 17.150 GiB/s 17.159 GiB/s]
                 change:
                        time:   [−93.292% −93.281% −93.272%] (p = 0.00 < 0.05)
                        thrpt:  [+1386.4% +1388.2% +1390.9%]
                        Performance has improved.

simd/memchr2/131072     time:   [6.5801 µs 6.5845 µs 6.5899 µs]
                        thrpt:  [18.524 GiB/s 18.539 GiB/s 18.551 GiB/s]
                 change:
                        time:   [−93.736% −93.729% −93.723%] (p = 0.00 < 0.05)
                        thrpt:  [+1493.2% +1494.7% +1496.4%]
                        Performance has improved.

simd/memchr2/134217728  time:   [7.1782 ms 7.1828 ms 7.1887 ms]
                        thrpt:  [17.388 GiB/s 17.403 GiB/s 17.414 GiB/s]
                 change:
                        time:   [−93.324% −93.319% −93.314%] (p = 0.00 < 0.05)
                        thrpt:  [+1395.6% +1396.9% +1398.0%]
                        Performance has improved.
```
@lhecker lhecker merged commit 259a198 into microsoft:main Jul 2, 2025
3 checks passed
@heiher heiher deleted the loong-simd-memchr2 branch July 2, 2025 16:02
Lou32Verbose pushed a commit to Lou32Verbose/edit that referenced this pull request Jan 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants