Skip to content

add nvhpc to test suite github runners#1317

Draft
sbryngelson wants to merge 22 commits intoMFlowCode:masterfrom
sbryngelson:nvhpc
Draft

add nvhpc to test suite github runners#1317
sbryngelson wants to merge 22 commits intoMFlowCode:masterfrom
sbryngelson:nvhpc

Conversation

@sbryngelson
Copy link
Copy Markdown
Member

Adds github runners for nvhpc builds on cpu and gpu

@github-actions
Copy link
Copy Markdown

Claude Code Review

Head SHA: 5dec31f
Files changed: 1

  • .github/workflows/test.yml

Summary

  • Adds 15 new matrix entries to the github job covering 5 NVHPC versions (23.11, 24.5, 24.11, 25.1, 25.3) × 3 targets (cpu, gpu-acc, gpu-omp).
  • NVHPC builds run inside the official nvcr.io/nvidia/nvhpc container on ubuntu-22.04 runners with nvfortran/nvc/nvc++ compilers and root-enabled OpenMPI.
  • GPU targets (gpu-acc, gpu-omp) exercise a build-only dry-run (--dry-run); actual test execution is restricted to cpu targets, which is appropriate since github-hosted runners lack GPU hardware.
  • All NVHPC jobs set continue-on-error: true (inherited from the job level), so failures are informational and do not block merges — matching the intended non-gating posture stated in the PR description.
  • Existing non-NVHPC steps are correctly guarded with !matrix.nvhpc conditions, and Intel/macOS setup steps are unaffected.

Findings

Observation (non-blocking): Missing MPI installation for NVHPC container
File: .github/workflows/test.yml, "Setup NVHPC" step (~line 178)

The NVHPC container image (nvcr.io/nvidia/nvhpc:*-devel-cuda_multi-ubuntu22.04) ships with NVHPC and CUDA toolchains but does not include a system OpenMPI installation. The CPU test step runs mfc.sh test ... --test-all without a --no-mpi flag, which will invoke MPI. The environment sets OMPI_ALLOW_RUN_AS_ROOT (suggesting MPI is expected to be used), but libopenmpi-dev / openmpi-bin are absent from the apt-get install list in the NVHPC setup step. This may cause CPU test failures at runtime if MFC's test runner tries to launch mpirun. Consider either adding openmpi-bin libopenmpi-dev to the NVHPC apt-get install line, or passing --no-mpi to the NVHPC Test step.

Minor style note: HDF5 not installed for NVHPC
File: .github/workflows/test.yml, "Setup NVHPC" step (~line 178)

The standard Ubuntu setup installs libhdf5-dev and hdf5-tools, but the NVHPC setup omits them. If any test case exercises HDF5 I/O (post_process output), the test run may fail. The standard non-NVHPC apt-get install list could serve as a reference for parity.

No issues found in the conditional logic, container image references, GPU flag mapping (gpu-acc--gpu acc, gpu-omp--gpu mp), or the CC/CXX/FC overrides. The ternary expression for runs-on and container is correct.

@github-actions
Copy link
Copy Markdown

Claude Code Review

Incremental review from: 5dec31f20ec2b50075f91eecae9f1caa98db638d
Head SHA: cd0d691df73cfa4e1528ffc02a3f8310c5a4cde9

New findings since last Claude review:

The two changes in this update are:

  1. nvhpc: [""] and target: [""] added to the base matrix so non-NVHPC entries have these keys defined with empty string values.
  2. The non-NVHPC job name updated from the bare string "Github" to format("Github ({0}, {1}, {2}, intel={3})", matrix.os, matrix.mpi, matrix.debug, matrix.intel).

Both changes are correctness improvements — no new issues introduced.

Observation (carry-over, still unaddressed): MPI and HDF5 missing in NVHPC container setup

The Setup NVHPC step still does not install openmpi-bin libopenmpi-dev or libhdf5-dev hdf5-tools. The Test (NVHPC) step runs --test-all without --no-mpi, so MPI invocations will fail at runtime. This was noted in the prior review and remains unresolved.

No other new high-confidence findings.

sbryngelson and others added 7 commits March 17, 2026 19:25
Test 5 NVHPC versions (23.11, 24.5, 24.11, 25.1, 25.3) on GitHub-hosted
runners using official NVIDIA container images. Each version runs both a
CPU build+test and a GPU (OpenACC) build-only target.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…efaults

GitHub Actions include entries with new-only keys get merged into all
existing combos instead of creating new ones. Adding nvhpc and target
as main matrix axes with empty defaults means the include entries
overwrite original values, forcing creation of standalone combos.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 18, 2026

<!-- claude-review: thread=primary; reviewed_sha=dc12849241387d84734cfd6bb98390b04c4f2361; mode=incremental -->

… build

- Add h_iL, h_iR, h_avg_2 and other chemistry locals to private clauses
  of two GPU_PARALLEL_LOOP regions in s_hllc_riemann_solver (lines 2201,
  2421) where they were declared at subroutine scope but not privatized,
  causing nvfortran 25.1 gpu-omp to fail with 'Could not find
  allocated-variable index for symbol - h_il'
- Disable ZFP compression in silo (-DSILO_ENABLE_ZFP=OFF) since MFC
  never uses it and nvc crashes compiling zfp/decode1d.c on 23.11/25.3
- Pin silo to 4.12.0 tag with GIT_SHALLOW; clean up HDF5 build flags
Change v_vf argument from dimension(sys_size) to dimension(:) to match
the caller which passes an allocatable array. The explicit-size vs
assumed-shape mismatch caused nvfortran 24.11's IPO inliner to create
partial inline temporaries that the OpenACC backend could not map.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 29, 2026

Claude Code Review

Incremental review from: 20288ec
Head SHA: f64894f

New findings since last Claude review:

  • !dir$ noinline directive silently neutered (src/simulation/m_time_steppers.fpp:628): Changing !dir$ noinline to ! dir$ noinline (adding a space after !) converts the NVIDIA nvfortran compiler directive into a plain Fortran comment. !dir$ with no space is the required syntax for nvfortran inline directives; with a space it is ignored entirely. The noinline hint was presumably added to work around a compiler issue with s_compute_dt; removing it silently while keeping the dead #ifdef __NVCOMPILER block may reintroduce whatever problem the directive was guarding against. If the intent is truly to remove the directive, the entire #ifdef block should be deleted; if the intent is to keep noinline, the space must be removed.

…versions

- Remove openmpi-bin/libopenmpi-dev from NVHPC container apt install.
  The system OpenMPI (gfortran-compiled) was overriding the NVHPC-bundled
  MPI at runtime via ldconfig, causing ABI mismatch segfaults in all CPU
  tests (rayleigh_taylor_muscl crash in s_interface_compression).
- Remove NVHPC version exclusion range for two-pass IPO. Binary search
  identified s_compute_dt as the single function causing ICE on 24.11;
  add noinline directive to prevent its extraction into the inline library.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant