-
Notifications
You must be signed in to change notification settings - Fork 309
Add amdgpu intrinsics #1976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add amdgpu intrinsics #1976
Conversation
|
I think it can be "tested" the same way we do |
d6043df to
e102811
Compare
|
Thanks, I tried to replicate what’s there for nvptx + adding The diff to my first push from when opening this PR is here: https://github.com/rust-lang/stdarch/compare/b3f5bdae0efbdd5f7297d0225623bd31c7fe895b..e1028110e77561574bfb7ea349154d46b5ea7b86 |
sayantn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have put some comments
|
Also I have noticed that Clang provides a different set of intrinsics than these ( |
|
Thanks for the review! Interesting, I thought I do plan to write a common mod amdgpu {
mod gpu {
use super::*;
pub fn block_id_x() -> u32 {
workitem_id_x()
}
}
}
// Same for nvptx
mod gpu {
#[cfg(target_arch = "amdgpu")]
pub use amdgpu::gpu::*;
#[cfg(target_arch = "nvptx64")]
pub use nvptx::gpu::*;
// + more intrinsics as in gpuintrin.h
} |
|
The analogue to If there are some interesting platform-specific intrinsics, we can add them, but generally we follow GCC and Clang in regards to intrinsics. Yea, a common |
e102811 to
d850408
Compare
|
Thanks for the review, that’s a nice case for const generics! I fixed all your inline comments. Also thanks for linking the clang intrinsics, I tend to forget about those. As common GPU intrinsics is a larger topic, I started a Discourse post here (please add your thoughts!): https://internals.rust-lang.org/t/naming-gpu-things-in-the-rust-compiler-and-standard-library/23833 My planned intrinsic “stack” is roughly as follows, from high-level to low-level:
We could also use clang names for
I would prefer following LLVM’s intrinsic names in Note for my future self: I got run.sh to work with |
sayantn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, left some more comments
|
I don't have any strong preferences, if you feel that the lower-level LLVM intrinsics are more useful that just having the clang intrinsics, go ahead with that -- we can change it, remove it, or add the clang intrinsics too if required (as long as they are unstable) |
d850408 to
af7ca40
Compare
|
Thanks for taking such a thorough look! |
sayantn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discovered some more problems with the intrinsics, wave.reduce.f{add,sub,max,min} and llvm.s.barrier.leave are not in the LLVM release branch yet, they were added like 1 month ago. So they will fail to compile once someone actually monomorphises the functions. We should add them when the changes actually propagates to the rust fork (~3 months probably), otherwise it will cause link failures
Also, you probably forgot to add intrinsics for the add and fadd ones.
The other changes LGTM
Add intrinsics for the amdgpu architecture.
af7ca40 to
c65620d
Compare
|
I fixed your comments and added a tests mod with instructions to check that generating the instructions works. |
sayantn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one nit
| // Note that depending on the target-cpu set in run.sh, some of these intrinsics are not available | ||
| // and compilation fails with `Cannot select: intrinsic %llvm.amdgcn...`. | ||
| // Uncomment these intrinsics to check. | ||
| #[cfg(test)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #[cfg(test)] | |
| #[cfg(debug_assertions)] |
We can't do cargo test or cargo build --tests because this target doesn't support bin crates. So this is probably better, this will also ensure that it is checked in CI (we do a --profile=dev run in CI)
Add intrinsics for the amdgpu architecture.
I’m not sure how to add/run CI (
ci/run.shfails for me e.g. fornvptx because
corecannot be found), but I checked that it compileswithout warnings with
Tracking issue: rust-lang/rust#149988