Skip to content

Poor codegen for Vector128.Create() with constants sharing the same value #63432

@dubiousconst282

Description

@dubiousconst282

Description

Calling Vector128.Create(float, float, float, float) with constants that share the same value results in poor codegen, as demonstrated below.

Sharplab

using System;
using System.Runtime.Intrinsics;

public class C {
    static Vector128<float> M1() {
        return Vector128.Create(1f,2f,4f,4f);
    }
    static Vector128<float> M2() {
        return Vector128.Create(1f,2f,3f,4f);
    }
}

Output assembly (CoreCLR 6.0.21.52210 on amd64)

C.M1()
    L0000: vzeroupper
    L0003: vmovss xmm0, [0x7ffb22bb0480]
    L000b: vmovss xmm1, [0x7ffb22bb0484]
    L0013: vinsertps xmm0, xmm0, xmm1, 0x10
    L0019: vmovss xmm1, [0x7ffb22bb0488]
    L0021: vmovaps xmm2, xmm1
    L0025: vinsertps xmm0, xmm0, xmm2, 0x20
    L002b: vinsertps xmm0, xmm0, xmm1, 0x30
    L0031: vmovupd [rcx], xmm0
    L0035: mov rax, rcx
    L0038: ret

C.M2()
    L0000: vzeroupper
    L0003: vmovupd xmm0, [0x7ffb22bb04c0]
    L000b: vmovupd [rcx], xmm0
    L000f: mov rax, rcx
    L0012: ret

This seems to affect all other VectorXXX.Create() functions, but only for float and double.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issue

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions