Rebuild GROMACS for cuda sanity check#1166
Conversation
|
Retrying after having implemented this suggestion to have the bot set bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80 |
|
New job on instance
|
|
@casparvl how did you add the suggestion? |
|
I added a SitePackage.lua to the |
|
14349790 failed the CUDA sanity check so that it is not getting stuck with the changes made to the .SitePackage.lua file. |
So, it's only PTX code that's missing, device code is there. I'll add the option to ignore this and retry. |
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80 |
|
New job on instance
|
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90 |
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
I have not updated the sitepackage.lua so I'm gonna cancel the jobs running at ugent. |
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-vsc-ugent for:arch=x86_64/amd/zen3,accel=nvidia/cc80 |
|
New job on instance
|
|
New job on instance
|
|
Doing the remainder of the builds (cross-compilations): bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws on:arch=zen2 for:arch=x86_64/amd/zen2,accel=nvidia/cc70 |
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
Ugent bot turns out not to be configured with this yet, so cross-compiling: bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws on:arch=zen2 for:arch=x86_64/amd/zen2,accel=nvidia/cc80 |
|
New job on instance
|
|
We should look into why these three builds are failing The tests on nvidia grace + cc90 are also failing. Since this is a native build this should also be resolved (#1166 (comment)). |
|
Check if these two pass now NCCL is rebuild and deployed: |
|
New job on instance
|
|
New job on instance
|
|
89454 failed with : |
Found this in the easybuild log: That happened before, e.g. in #709 (comment), and often it works when you try again. bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws on:arch=icelake for:arch=x86_64/intel/icelake,accel=nvidia/cc90 |
|
New job on instance
|
|
New job on instance
|
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws on:arch=cascadelake for:arch=x86_64/intel/cascadelake,accel=nvidia/cc80 |
|
New job on instance
|
Looks like its config file needs to be updated? |
|
Ah, the bot config was recently updated. Let's try again: bot: build repo:eessi.io-2023.06-software instance:eessi-bot-jsc for:arch=aarch64/nvidia/grace,accel=nvidia/cc90 |
|
New job on instance
|
Checklist for supported CPU-GPU combos (by @boegel):
generic)aarch64x86_64aarch64)neoverse_n1neoverse_v1nvidia/gracex86_64/amd)zen2zen3zen4x86_64/intel)haswellskylake_avx512cascadelakeicelakesapphirerapids