Read the room, skip the rest by cds-amal · Pull Request #145 · runtimeverification/stable-mir-json

cds-amal · 2026-03-11T01:32:01Z

The UI tests (where we run stable-mir-json against rustc's own test suite) have a versioning problem that mirrors the golden file problem solved in PR #144: different nightlies have different sets of UI tests. Files get added, deleted, and renamed between nightly commits. A test that exists in nightly-2025-03-01 might be gone by nightly-2025-10-03, or moved to a different directory. Running a stale test list against a newer nightly produces spurious failures; maintaining the lists by hand is tedious and error-prone.

This PR adds the tooling to generate, validate, and run per-nightly UI test lists automatically. Three pieces:

parse_test_directives.awk: an awk script that extracts //@ directives from rustc test source files and decides whether a test should be skipped on the current host. It handles only-<target>, ignore-<target>, needs-sanitizer, needs-subprocess-spawning, edition directives, compile-flags, and a handful of environment-specific skips. A "universal" mode suppresses platform-specific filtering so that generated lists are correct on any host; platform filtering happens at runtime instead.

One environment-specific skip worth calling out: tests that reference extern crate libc are skipped because our sysroot contains both .rmeta and .rlib artifacts for libc. Rustc, invoked directly outside cargo, sees two candidates for the same crate and bails with E0464. Cargo normally sidesteps this by passing --extern libc=/exact/path, but we don't have that luxury. The skip is a pragmatic workaround, not a judgment on the tests themselves.

diff_test_lists.sh: given a rust-lang/rust checkout, this script diffs the tests/ui/ directory between the base nightly commit and a target nightly commit. It tracks file deletions, renames, and additions, then applies them to the base passing.tsv/failing.tsv to produce effective per-nightly test lists. The output is deterministic: same repo + same commits = same lists. Supports --report (human-readable diff summary), --emit (write lists to tests/ui/overrides//), and --chain (show incremental diffs between consecutive nightlies).
Rewrites of run_ui_tests.sh and remake_ui_tests.sh: both now use the shared directive parser instead of inline awk snippets, pick up per-nightly override lists when available, and handle architecture filtering correctly. run_ui_tests.sh also fixes the RUN_SMIR library path issue that caused failures on some setups.

The PR includes pre-generated override lists for all 13 supported nightlies (2025-03-01 through 2026-01-15), a unit test suite for the directive parser (test_directives_test.sh, ~420 lines of boundary-condition tests), and corresponding Makefile targets (make test-ui, make test-ui-emit, make test-directives).

Test plan

make test-directives passes (unit tests for the awk parser)
make test-ui RUST_DIR_ROOT=/path/to/rust passes with the pinned nightly
make test-ui-emit RUST_DIR_ROOT=/path/to/rust NIGHTLY=nightly-2025-03-01 generates lists matching the checked-in overrides

Add a 'make help' target with awk-based extraction of target descriptions. Also adds standalone 'fmt', 'clippy', 'stdlib-smir', 'build-info', and nightly administration targets. Uses the final Makefile structure: targets for golden file management, UI testing, and nightly lifecycle are included but depend on scripts added in later commits.

Some types (e.g., dyn Trait in certain positions) cause rustc's layout computation to panic rather than returning an error. Wrap layout calls in catch_unwind so the type visitor can continue; panicked types are recorded and reported in a summary rather than crashing the whole run.

Instead of hardcoding the library path, resolve it from the active nightly toolchain at runtime. This avoids breakage when the toolchain directory name changes.

Drop the [metadata] section from rust-toolchain.toml; the UI test scripts now derive the rustc commit hash directly from the nightly date via the Rust manifest. Also fixes clippy uninlined_format_args warnings.

The receipt-driven integration-test target requires receipts (PR 2) and per-nightly golden directories (PR 3). Revert to master's flat-file version so CI passes on the foundation PR.

TIL (thanks Copilot!), the previous approach temporarily replaced the process-wide panic hook with a no-op to suppress backtraces from caught layout panics. Turns out that's a thread-safety footgun: rustc's own worker threads could panic while the no-op hook is installed, silently swallowing unrelated diagnostics. The hook swap was also racy with anything else that calls set_hook concurrently. catch_unwind is what actually keeps the process alive; the hook swap was purely cosmetic (suppressing stderr noise). Dropped it entirely and accepted the default backtrace output for caught panics. The end-of-run LayoutPanic summary still reports everything it did before. My research surfaced a way to suppress the stderr noise that I decided was too much for a little noise, but I'm including here for completeness. Set teh hook once at startup (not per-call) to a hook that checks a thread-local flag: ```rust thread_local! { static SUPPRESS_PANIC_OUTPUT: Cell<bool> = const { Cell::new(false) }; } // Called once, e.g. in driver setup: std::panic::set_hook(Box::new(|info| { SUPPRESS_PANIC_OUTPUT.with(|flag| { if !flag.get() { eprintln!("{info}"); } }); })); // Then in try_layout_shape, just toggle the flag: SUPPRESS_PANIC_OUTPUT.with(|f| f.set(true)); let result = catch_unwind(AssertUnwindSafe(|| ty.layout()...)); SUPPRESS_PANIC_OUTPUT.with(|f| f.set(false)); This is thread-safe (thread-local, not global), no race conditions, no risk of swallowing other threads' panics, and our collected LayoutPanic report at teh end works exactly as before.

If this script is sourced from a working directory outside the repo tree, rustup won't find rust-toolchain.toml and may select whatever default toolchain happens to be active, giving us the wrong commit hash. We now derive the repo root from BASH_SOURCE (two levels up from the script's own directory) and cd there before invoking rustc, so the toolchain selection stays correct regardless of the caller's CWD.

TOOLCHAIN_NAME defaults to empty in the Makefile, which meant make clean would always run rustup toolchain uninstall "" and fail. Now we only attempt the uninstall when the variable is actually set.

The workflow was downloading yq from GitHub releases without any integrity check (a supply-chain risk, however small, on CI runners). We now download the upstream checksums file alongside the binary and verify the SHA256 before installing. While we're at it, the identical 8-line install block was copy-pasted across all three jobs. Extracted into .github/scripts/install-yq.sh so there's exactly one place to update the version or change the verification logic.

The checksums file from mikefarah/yq uses a custom multi-hash-per-line format that isn't compatible with sha256sum -c (which expects GNU coreutils format). Switched to checksums-bsd, which uses the standard BSD-style "SHA256 (file) = hash" layout; a small sed converts that to GNU format for verification.

Add a spy-based serialization pass that detects which JSON paths carry non-deterministic interned indices (Ty, Span, AllocId, etc.) and emits a companion *.smir.receipts.json alongside each *.smir.json output. The receipts declare three categories of interned indices: - interned_keys: object field names whose values are interned - interned_newtypes: enum variant wrappers around bare integers - interned_positions: known tuple positions carrying interned indices These receipts drive the normalise-filter.jq used for golden-file comparison, replacing the previous hardcoded normalization rules with a data-driven approach. See ADR-004 for the design rationale.

cds-amal · 2026-03-11T20:25:39Z

tests/ui/parse_test_directives.awk

+
+    # Map host_os to the set of OS names this host satisfies.
+    # "unix" covers linux, macos, freebsd, etc.  "apple" covers macos.
+    is_unix  = (host_os == "linux" || host_os == "macos" || host_os == "freebsd" || host_os == "openbsd" || host_os == "netbsd" || host_os == "dragonfly" || host_os == "solaris" || host_os == "illumos" || host_os == "android")


I'm open to paring this down :) Solaris, it's been a while.

cds-amal

A note to reviewers, pay attention to the parse_test_directives.awk.

…structure Add build.rs with a breakpoint table that detects the active rustc nightly's commit-date and emits cfg flags for stable MIR API changes. Add nightly_admin.py for managing nightly toolchains (add/check/bump). Update normalise-filter.jq for receipt-driven normalization. Add pr.md to .gitignore.

…ough 2025-03-01)

Move integration test expected outputs from flat files in programs/ to per-nightly directories under expected/nightly-2025-03-01/, enabling the test harness to select the correct golden files for the active toolchain.

…ructure Add an awk-based directive parser (parse_test_directives.awk) that extracts test metadata (editions, compile-flags, skip conditions) from rustc UI test source files. This replaces shell-level heuristics with a single-pass parser that handles: - //@ directives (edition, compile-flags, needs-*, ignore-*) - Architecture and subprocess filtering - Range-based nightly gating via override TSV files Rewrite run_ui_tests.sh and remake_ui_tests.sh to use the shared parser. Add diff_test_lists.sh for generating per-nightly effective test lists with caching. Include unit tests and boundary notes. Per-nightly override TSV files allow fine-grained control over which tests pass/fail on each nightly without modifying the base lists.

cds-amal added 5 commits March 10, 2026 18:17

fix(cargo_stable_mir_json): derive toolchain lib path dynamically

102122a

Instead of hardcoding the library path, resolve it from the active nightly toolchain at runtime. This avoids breakage when the toolchain directory name changes.

refactor: derive rustc commit from toolchain, inline format args

21a3ddd

Drop the [metadata] section from rust-toolchain.toml; the UI test scripts now derive the rustc commit hash directly from the nightly date via the Rust manifest. Also fixes clippy uninlined_format_args warnings.

fix(make): revert integration-test to flat golden files for PR 1

1829855

The receipt-driven integration-test target requires receipts (PR 2) and per-nightly golden directories (PR 3). Revert to master's flat-file version so CI passes on the foundation PR.

cds-amal mentioned this pull request Mar 11, 2026

All Nightlies Must Pass #146

Open

cds-amal added 6 commits March 11, 2026 15:19

fix(make): guard toolchain uninstall on non-empty TOOLCHAIN_NAME

b1edcde

TOOLCHAIN_NAME defaults to empty in the Makefile, which meant make clean would always run rustup toolchain uninstall "" and fail. Now we only attempt the uninstall when the variable is actually set.

cds-amal force-pushed the stack/pr3-golden-infra branch from d440b10 to f7b4f55 Compare March 11, 2026 19:38

cds-amal force-pushed the stack/pr4-ui-testing branch from 4ee5139 to d13182a Compare March 11, 2026 19:39

cds-amal commented Mar 11, 2026

View reviewed changes

cds-amal added 3 commits March 12, 2026 21:11

feat(compat): add compat code for initial breakpoints (2025-01-24 thr…

293bd90

…ough 2025-03-01)

test: migrate golden files to per-nightly directories

273f002

Move integration test expected outputs from flat files in programs/ to per-nightly directories under expected/nightly-2025-03-01/, enabling the test harness to select the correct golden files for the active toolchain.

cds-amal force-pushed the stack/pr3-golden-infra branch from f7b4f55 to 273f002 Compare March 13, 2026 01:13

cds-amal force-pushed the stack/pr4-ui-testing branch from d13182a to eaadea6 Compare March 13, 2026 01:19

cds-amal force-pushed the stack/pr3-golden-infra branch from 273f002 to 8754b7e Compare March 22, 2026 05:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read the room, skip the rest#145

Read the room, skip the rest#145
cds-amal wants to merge 15 commits intostack/pr3-golden-infrafrom
stack/pr4-ui-testing

cds-amal commented Mar 11, 2026

Uh oh!

cds-amal Mar 11, 2026

Uh oh!

cds-amal left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cds-amal commented Mar 11, 2026

Test plan

Uh oh!

cds-amal Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

cds-amal left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant