Skip to content

Comments

{2023.06}[foss/2023a] UCC-CUDA 1.2.0 w/ CUDA 12.1.1 (rebuild)#750

Merged
ocaisa merged 1 commit intoEESSI:2023.06-software.eessi.iofrom
casparvl:ucc_cuda_rebuild
Sep 26, 2024
Merged

{2023.06}[foss/2023a] UCC-CUDA 1.2.0 w/ CUDA 12.1.1 (rebuild)#750
ocaisa merged 1 commit intoEESSI:2023.06-software.eessi.iofrom
casparvl:ucc_cuda_rebuild

Conversation

@casparvl
Copy link
Collaborator

@casparvl casparvl commented Sep 25, 2024

@eessi-bot
Copy link

eessi-bot bot commented Sep 25, 2024

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-compat, eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software

@eessi-build-deploy-bot-deucalion

Instance boegel-bot-deucalion is configured to build for:

  • architectures: aarch64/a64fx
  • repositories: eessi.io-2023.06-software

@eessi-bot
Copy link

eessi-bot bot commented Sep 25, 2024

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software, eessi.io-2023.06-compat

@casparvl casparvl added 2023.06-software.eessi.io 2023.06 version of software.eessi.io accel:nvidia labels Sep 25, 2024
@casparvl casparvl changed the title UCC CUDA rebuild now that we have an accel prefix {2023.06}[foss/2023a] UCC 1.2.0 w/ CUDA 12.1.1 (rebuild) Sep 25, 2024
@boegel
Copy link
Contributor

boegel commented Sep 25, 2024

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80

@eessi-bot
Copy link

eessi-bot bot commented Sep 25, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

@eessi-build-deploy-bot-deucalion
Copy link

eessi-build-deploy-bot-deucalion bot commented Sep 25, 2024

Updates by the bot instance boegel-bot-deucalion (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

@eessi-bot
Copy link

eessi-bot bot commented Sep 25, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 from boegel

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen2 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

@eessi-bot
Copy link

eessi-bot bot commented Sep 25, 2024

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen2 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.09/pr_750/19874

date job status comment
Sep 25 21:32:16 UTC 2024 submitted job id 19874 awaits release by job manager
Sep 25 21:32:34 UTC 2024 released job awaits launch by Slurm scheduler
Sep 25 21:37:59 UTC 2024 running job 19874 is running
Sep 25 21:57:34 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-19874.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-1727300676.tar.gzsize: 0 MiB (447423 bytes)
entries: 29
modules under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/modules/all
UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software
UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80
no other files in tarball
Sep 25 21:57:34 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-19874.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Sep 26 09:14:59 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen2-1727300676.tar.gz to S3 bucket succeeded

@casparvl
Copy link
Collaborator Author

bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80

@eessi-bot
Copy link

eessi-bot bot commented Sep 26, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

@eessi-build-deploy-bot-deucalion
Updates by the bot instance boegel-bot-deucalion (click for details)
  • account casparvl has NO permission to send commands to the bot

@eessi-bot
Copy link

eessi-bot bot commented Sep 26, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:x86_64/amd/zen3 accel:nvidia/cc80 from casparvl

    • expanded format: build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • no jobs were submitted

@eessi-bot
Copy link

eessi-bot bot commented Sep 26, 2024

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.09/pr_750/20002

date job status comment
Sep 26 08:06:49 UTC 2024 submitted job id 20002 awaits release by job manager
Sep 26 08:07:53 UTC 2024 released job awaits launch by Slurm scheduler
Sep 26 08:13:34 UTC 2024 running job 20002 is running
Sep 26 08:30:04 UTC 2024 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-20002.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-1727338704.tar.gzsize: 0 MiB (447402 bytes)
entries: 29
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
no other files in tarball
Sep 26 08:30:04 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 9/9 test case(s) from 9 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-20002.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Sep 26 09:15:19 UTC 2024 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen3-1727338704.tar.gz to S3 bucket succeeded

@casparvl
Copy link
Collaborator Author

The one from the zen2 tree looks good:

[casparvl@login1 1.2.0-GCCcore-12.3.0-CUDA-12.1.1]$ readelf -d lib64/ucc/libucc_mc_cuda.so | grep RPATH | grep CUDA
 0x000000000000000f (RPATH)              Library rpath: [/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software/UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software/UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1/lib64:$ORIGIN:$ORIGIN/../lib:$ORIGIN/../lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software/NCCL/2.18.3-GCCcore-12.3.0-CUDA-12.1.1/lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/UCC/1.2.0-GCCcore-12.3.0/lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software/CUDA/12.1.1/lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/GCCcore/12.3.0/lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/GCCcore/12.3.0/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/numactl/2.0.16-GCCcore-12.3.0/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/UCX/1.14.1-GCCcore-12.3.0/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/GDRCopy/2.3.1-GCCcore-12.3.0/lib/../lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/GCCcore/12.3.0/lib/gcc/x86_64-pc-linux-gnu/12.3.0:/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/lib/../lib64:/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/lib/../lib64:/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/lib:/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/GDRCopy/2.3.1-GCCcore-12.3.0/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/software/UCC/1.2.0-GCCcore-12.3.0/lib]

@boegel boegel changed the title {2023.06}[foss/2023a] UCC 1.2.0 w/ CUDA 12.1.1 (rebuild) {2023.06}[foss/2023a] UCC-CUDA 1.2.0 w/ CUDA 12.1.1 (rebuild) Sep 26, 2024
@casparvl
Copy link
Collaborator Author

Also looking good in zen3:

[casparvl@login1 1.2.0-GCCcore-12.3.0-CUDA-12.1.1]$ readelf -d lib64/ucc/libucc_mc_cuda.so | grep RPATH | grep CUDA
 0x000000000000000f (RPATH)              Library rpath: [/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software/UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software/UCC-CUDA/1.2.0-GCCcore-12.3.0-CUDA-12.1.1/lib64:$ORIGIN:$ORIGIN/../lib:$ORIGIN/../lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software/NCCL/2.18.3-GCCcore-12.3.0-CUDA-12.1.1/lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/UCC/1.2.0-GCCcore-12.3.0/lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software/CUDA/12.1.1/lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/GCCcore/12.3.0/lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/GCCcore/12.3.0/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/numactl/2.0.16-GCCcore-12.3.0/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/pkgconf/1.9.5-GCCcore-12.3.0/lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/UCX/1.14.1-GCCcore-12.3.0/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/GDRCopy/2.3.1-GCCcore-12.3.0/lib/../lib64:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/GCCcore/12.3.0/lib/gcc/x86_64-pc-linux-gnu/12.3.0:/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/lib/../lib64:/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/lib/../lib64:/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/lib:/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/GDRCopy/2.3.1-GCCcore-12.3.0/lib:/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/UCC/1.2.0-GCCcore-12.3.0/lib]

@casparvl casparvl added the bot:deploy Ask bot to deploy missing software installations to EESSI label Sep 26, 2024
@eessi-build-deploy-bot-deucalion

Label bot:deploy has been set by user casparvl, but this person does not have permission to trigger deployments

@casparvl
Copy link
Collaborator Author

Just checked, it's been deployed, so this PR can be merged (by someone not me :D)

@ocaisa ocaisa merged commit 5d9d33b into EESSI:2023.06-software.eessi.io Sep 26, 2024
@eessi-bot
Copy link

eessi-bot bot commented Sep 26, 2024

PR merged! Moved ['/project/def-users/SHARED/jobs/2024.09/pr_750/19874', '/project/def-users/SHARED/jobs/2024.09/pr_750/20002'] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2024.09.26

@eessi-build-deploy-bot-deucalion

PR merged! Moved [] to $HOME/trash_bin/EESSI/software-layer/2024.09.26

@eessi-bot
Copy link

eessi-bot bot commented Sep 26, 2024

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2024.09.26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2023.06-software.eessi.io 2023.06 version of software.eessi.io accel:nvidia bot:deploy Ask bot to deploy missing software installations to EESSI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants