Add SIMD impl of memset for LoongArch#547
Merged
lhecker merged 1 commit intomicrosoft:mainfrom Jul 2, 2025
heiher:loong-simd-memset
Merged
Add SIMD impl of memset for LoongArch#547lhecker merged 1 commit intomicrosoft:mainfrom heiher:loong-simd-memset
memset for LoongArch#547lhecker merged 1 commit intomicrosoft:mainfrom
heiher:loong-simd-memset
Conversation
Benchmark results on LA664:
- LASX
```
simd/memset<u32>/8 time: [2.5735 ns 2.6308 ns 2.6957 ns]
thrpt: [2.7639 GiB/s 2.8321 GiB/s 2.8951 GiB/s]
change:
time: [−14.812% −11.240% −7.7107%] (p = 0.00 < 0.05)
thrpt: [+8.3549% +12.664% +17.387%]
Performance has improved.
simd/memset<u32>/136 time: [8.0049 ns 8.0098 ns 8.0159 ns]
thrpt: [15.801 GiB/s 15.813 GiB/s 15.823 GiB/s]
change:
time: [−51.251% −51.202% −51.145%] (p = 0.00 < 0.05)
thrpt: [+104.69% +104.92% +105.13%]
Performance has improved.
simd/memset<u32>/1024 time: [12.407 ns 12.414 ns 12.422 ns]
thrpt: [76.770 GiB/s 76.824 GiB/s 76.866 GiB/s]
change:
time: [−88.281% −88.262% −88.249%] (p = 0.00 < 0.05)
thrpt: [+750.97% +751.94% +753.34%]
Performance has improved.
simd/memset<u32>/131072 time: [2.4655 µs 2.4668 µs 2.4685 µs]
thrpt: [49.450 GiB/s 49.485 GiB/s 49.512 GiB/s]
change:
time: [−81.223% −81.209% −81.195%] (p = 0.00 < 0.05)
thrpt: [+431.76% +432.17% +432.57%]
Performance has improved.
simd/memset<u32>/134217728
time: [4.4058 ms 4.4173 m`s 4.4313 ms]
thrpt: [28.208 GiB/s 28.298 GiB/s 28.372 GiB/s]
change:
time: [−67.246% −67.154% −67.062%] (p = 0.00 < 0.05)
thrpt: [+203.60% +204.45% +205.30%]
Performance has improved.
simd/memset<u8>/8 time: [3.2015 ns 3.2029 ns 3.2050 ns]
thrpt: [2.3247 GiB/s 2.3262 GiB/s 2.3272 GiB/s]
change:
time: [−0.0718% +0.0012% +0.0858%] (p = 0.97 > 0.05)
thrpt: [−0.0857% −0.0012% +0.0719%]
No change in performance detected.
simd/memset<u8>/136 time: [3.6125 ns 3.6174 ns 3.6229 ns]
thrpt: [34.961 GiB/s 35.014 GiB/s 35.062 GiB/s]
change:
time: [−0.6087% −0.1314% +0.2680%] (p = 0.58 > 0.05)
thrpt: [−0.2673% +0.1316% +0.6124%]
No change in performance detected.
simd/memset<u8>/1024 time: [11.341 ns 11.346 ns 11.353 ns]
thrpt: [84.002 GiB/s 84.055 GiB/s 84.092 GiB/s]
change:
time: [−0.1288% −0.0636% +0.0134%] (p = 0.06 > 0.05)
thrpt: [−0.0134% +0.0636% +0.1290%]
No change in performance detected.
simd/memset<u8>/131072 time: [2.4705 µs 2.4717 µs 2.4733 µs]
thrpt: [49.354 GiB/s 49.388 GiB/s 49.411 GiB/s]
change:
time: [−0.2564% −0.0972% +0.0179%] (p = 0.20 > 0.05)
thrpt: [−0.0179% +0.0973% +0.2570%]
No change in performance detected.
simd/memset<u8>/134217728
time: [4.4030 ms 4.4104 ms 4.4190 ms]
thrpt: [28.287 GiB/s 28.342 GiB/s 28.390 GiB/s]
change:
time: [−0.0583% +0.1614% +0.3954%] (p = 0.17 > 0.05)
thrpt: [−0.3938% −0.1611% +0.0583%]
No change in performance detected.
```
- LSX
```
simd/memset<u32>/8 time: [2.3534 ns 2.4342 ns 2.5242 ns]
thrpt: [2.9516 GiB/s 3.0608 GiB/s 3.1658 GiB/s]
change:
time: [−19.116% −15.130% −11.603%] (p = 0.00 < 0.05)
thrpt: [+13.126% +17.828% +23.633%]
Performance has improved.
simd/memset<u32>/136 time: [7.0382 ns 7.0426 ns 7.0480 ns]
thrpt: [17.971 GiB/s 17.985 GiB/s 17.996 GiB/s]
change:
time: [−57.141% −57.110% −57.079%] (p = 0.00 < 0.05)
thrpt: [+132.99% +133.15% +133.33%]
Performance has improved.
simd/memset<u32>/1024 time: [30.019 ns 30.037 ns 30.060 ns]
thrpt: [31.725 GiB/s 31.750 GiB/s 31.769 GiB/s]
change:
time: [−71.652% −71.606% −71.575%] (p = 0.00 < 0.05)
thrpt: [+251.80% +252.19% +252.76%]
Performance has improved.
simd/memset<u32>/131072 time: [3.2877 µs 3.2897 µs 3.2923 µs]
thrpt: [37.077 GiB/s 37.107 GiB/s 37.130 GiB/s]
change:
time: [−74.960% −74.930% −74.885%] (p = 0.00 < 0.05)
thrpt: [+298.17% +298.89% +299.36%]
Performance has improved.
simd/memset<u32>/134217728
time: [4.4147 ms 4.4179 ms 4.4218 ms]
thrpt: [28.269 GiB/s 28.294 GiB/s 28.314 GiB/s]
change:
time: [−67.181% −67.149% −67.114%] (p = 0.00 < 0.05)
thrpt: [+204.08% +204.41% +204.71%]
Performance has improved.
simd/memset<u8>/8 time: [3.2016 ns 3.2031 ns 3.2052 ns]
thrpt: [2.3245 GiB/s 2.3260 GiB/s 2.3271 GiB/s]
change:
time: [−0.0666% +0.0051% +0.0807%] (p = 0.90 > 0.05)
thrpt: [−0.0807% −0.0051% +0.0667%]
No change in performance detected.
simd/memset<u8>/136 time: [3.6111 ns 3.6152 ns 3.6197 ns]
thrpt: [34.991 GiB/s 35.035 GiB/s 35.075 GiB/s]
change:
time: [−0.7119% −0.2467% +0.1774%] (p = 0.29 > 0.05)
thrpt: [−0.1771% +0.2473% +0.7170%]
No change in performance detected.
simd/memset<u8>/1024 time: [11.342 ns 11.349 ns 11.358 ns]
thrpt: [83.962 GiB/s 84.030 GiB/s 84.082 GiB/s]
change:
time: [−0.1126% −0.0288% +0.0686%] (p = 0.59 > 0.05)
thrpt: [−0.0685% +0.0288% +0.1127%]
No change in performance detected.
simd/memset<u8>/131072 time: [2.4707 µs 2.4723 µs 2.4742 µs]
thrpt: [49.337 GiB/s 49.376 GiB/s 49.407 GiB/s]
change:
time: [−0.2277% −0.0571% +0.0834%] (p = 0.52 > 0.05)
thrpt: [−0.0833% +0.0571% +0.2282%]
No change in performance detected.
simd/memset<u8>/134217728
time: [4.3991 ms 4.4057 ms 4.4138 ms]
thrpt: [28.320 GiB/s 28.372 GiB/s 28.415 GiB/s]
change:
time: [−0.1523% +0.0554% +0.2680%] (p = 0.62 > 0.05)
thrpt: [−0.2673% −0.0554% +0.1525%]
No change in performance detected.
```
lhecker
approved these changes
Jul 2, 2025
Lou32Verbose
pushed a commit
to Lou32Verbose/edit
that referenced
this pull request
Jan 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Benchmark results on LA664: