Skip to content

Read the room, skip the rest#145

Open
cds-amal wants to merge 15 commits intostack/pr3-golden-infrafrom
stack/pr4-ui-testing
Open

Read the room, skip the rest#145
cds-amal wants to merge 15 commits intostack/pr3-golden-infrafrom
stack/pr4-ui-testing

Conversation

@cds-amal
Copy link
Copy Markdown
Collaborator

The UI tests (where we run stable-mir-json against rustc's own test suite) have a versioning problem that mirrors the golden file problem solved in PR #144: different nightlies have different sets of UI tests. Files get added, deleted, and renamed between nightly commits. A test that exists in nightly-2025-03-01 might be gone by nightly-2025-10-03, or moved to a different directory. Running a stale test list against a newer nightly produces spurious failures; maintaining the lists by hand is tedious and error-prone.

This PR adds the tooling to generate, validate, and run per-nightly UI test lists automatically. Three pieces:

  • parse_test_directives.awk: an awk script that extracts //@ directives from rustc test source files and decides whether a test should be skipped on the current host. It handles only-<target>, ignore-<target>, needs-sanitizer, needs-subprocess-spawning, edition directives, compile-flags, and a handful of environment-specific skips. A "universal" mode suppresses platform-specific filtering so that generated lists are correct on any host; platform filtering happens at runtime instead.

One environment-specific skip worth calling out: tests that reference extern crate libc are skipped because our sysroot contains both .rmeta and .rlib artifacts for libc. Rustc, invoked directly outside cargo, sees two candidates for the same crate and bails with E0464. Cargo normally sidesteps this by passing --extern libc=/exact/path, but we don't have that luxury. The skip is a pragmatic workaround, not a judgment on the tests themselves.

  • diff_test_lists.sh: given a rust-lang/rust checkout, this script diffs the tests/ui/ directory between the base nightly commit and a target nightly commit. It tracks file deletions, renames, and additions, then applies them to the base passing.tsv/failing.tsv to produce effective per-nightly test lists. The output is deterministic: same repo + same commits = same lists. Supports --report (human-readable diff summary), --emit (write lists to tests/ui/overrides//), and --chain (show incremental diffs between consecutive nightlies).

  • Rewrites of run_ui_tests.sh and remake_ui_tests.sh: both now use the shared directive parser instead of inline awk snippets, pick up per-nightly override lists when available, and handle architecture filtering correctly. run_ui_tests.sh also fixes the RUN_SMIR library path issue that caused failures on some setups.

The PR includes pre-generated override lists for all 13 supported nightlies (2025-03-01 through 2026-01-15), a unit test suite for the directive parser (test_directives_test.sh, ~420 lines of boundary-condition tests), and corresponding Makefile targets (make test-ui, make test-ui-emit, make test-directives).

Test plan

  • make test-directives passes (unit tests for the awk parser)
  • make test-ui RUST_DIR_ROOT=/path/to/rust passes with the pinned nightly
  • make test-ui-emit RUST_DIR_ROOT=/path/to/rust NIGHTLY=nightly-2025-03-01 generates lists matching the checked-in overrides

Add a 'make help' target with awk-based extraction of target
descriptions. Also adds standalone 'fmt', 'clippy', 'stdlib-smir',
'build-info', and nightly administration targets.

Uses the final Makefile structure: targets for golden file management,
UI testing, and nightly lifecycle are included but depend on scripts
added in later commits.
Some types (e.g., dyn Trait in certain positions) cause rustc's layout
computation to panic rather than returning an error. Wrap layout calls
in catch_unwind so the type visitor can continue; panicked types are
recorded and reported in a summary rather than crashing the whole run.
Instead of hardcoding the library path, resolve it from the active
nightly toolchain at runtime. This avoids breakage when the toolchain
directory name changes.
Drop the [metadata] section from rust-toolchain.toml; the UI test
scripts now derive the rustc commit hash directly from the nightly
date via the Rust manifest. Also fixes clippy uninlined_format_args
warnings.
The receipt-driven integration-test target requires receipts (PR 2) and
per-nightly golden directories (PR 3). Revert to master's flat-file
version so CI passes on the foundation PR.
TIL (thanks Copilot!), the previous approach temporarily replaced the
process-wide panic hook with a no-op to suppress backtraces from caught
layout panics. Turns out that's a thread-safety footgun: rustc's own
worker threads could panic while the no-op hook is installed, silently
swallowing unrelated diagnostics. The hook swap was also racy with
anything else that calls set_hook concurrently.

catch_unwind is what actually keeps the process alive; the hook swap was
purely cosmetic (suppressing stderr noise). Dropped it entirely and
accepted the default backtrace output for caught panics. The end-of-run
LayoutPanic summary still reports everything it did before. My research
surfaced a way to suppress the stderr noise that I decided was too much
for a little noise, but I'm including here for completeness.

Set teh hook once at startup (not per-call) to a hook that checks a
thread-local flag:

```rust
thread_local! {
  static SUPPRESS_PANIC_OUTPUT: Cell<bool> = const { Cell::new(false) };
}

// Called once, e.g. in driver setup:
std::panic::set_hook(Box::new(|info| {
  SUPPRESS_PANIC_OUTPUT.with(|flag| {
      if !flag.get() {
          eprintln!("{info}");
      }
  });
}));

// Then in try_layout_shape, just toggle the flag:
SUPPRESS_PANIC_OUTPUT.with(|f| f.set(true));
let result = catch_unwind(AssertUnwindSafe(|| ty.layout()...));
SUPPRESS_PANIC_OUTPUT.with(|f| f.set(false));

This is thread-safe (thread-local, not global), no race conditions, no
risk of swallowing other threads' panics, and our collected LayoutPanic
report at teh end works exactly as before.
If this script is sourced from a working directory outside the repo
tree, rustup won't find rust-toolchain.toml and may select whatever
default toolchain happens to be active, giving us the wrong commit
hash. We now derive the repo root from BASH_SOURCE (two levels up from
the script's own directory) and cd there before invoking rustc, so the
toolchain selection stays correct regardless of the caller's CWD.
TOOLCHAIN_NAME defaults to empty in the Makefile, which meant
make clean would always run rustup toolchain uninstall "" and fail.
Now we only attempt the uninstall when the variable is actually set.
The workflow was downloading yq from GitHub releases without any
integrity check (a supply-chain risk, however small, on CI runners).
We now download the upstream checksums file alongside the binary and
verify the SHA256 before installing.

While we're at it, the identical 8-line install block was copy-pasted
across all three jobs. Extracted into .github/scripts/install-yq.sh
so there's exactly one place to update the version or change the
verification logic.
The checksums file from mikefarah/yq uses a custom multi-hash-per-line
format that isn't compatible with sha256sum -c (which expects GNU
coreutils format). Switched to checksums-bsd, which uses the standard
BSD-style "SHA256 (file) = hash" layout; a small sed converts that to
GNU format for verification.
Add a spy-based serialization pass that detects which JSON paths carry
non-deterministic interned indices (Ty, Span, AllocId, etc.) and emits
a companion *.smir.receipts.json alongside each *.smir.json output.

The receipts declare three categories of interned indices:
  - interned_keys: object field names whose values are interned
  - interned_newtypes: enum variant wrappers around bare integers
  - interned_positions: known tuple positions carrying interned indices

These receipts drive the normalise-filter.jq used for golden-file
comparison, replacing the previous hardcoded normalization rules with
a data-driven approach. See ADR-004 for the design rationale.
@cds-amal cds-amal force-pushed the stack/pr3-golden-infra branch from d440b10 to f7b4f55 Compare March 11, 2026 19:38
@cds-amal cds-amal force-pushed the stack/pr4-ui-testing branch from 4ee5139 to d13182a Compare March 11, 2026 19:39

# Map host_os to the set of OS names this host satisfies.
# "unix" covers linux, macos, freebsd, etc. "apple" covers macos.
is_unix = (host_os == "linux" || host_os == "macos" || host_os == "freebsd" || host_os == "openbsd" || host_os == "netbsd" || host_os == "dragonfly" || host_os == "solaris" || host_os == "illumos" || host_os == "android")
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to paring this down :) Solaris, it's been a while.

Copy link
Copy Markdown
Collaborator Author

@cds-amal cds-amal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note to reviewers, pay attention to the parse_test_directives.awk.

…structure

Add build.rs with a breakpoint table that detects the active rustc
nightly's commit-date and emits cfg flags for stable MIR API changes.
Add nightly_admin.py for managing nightly toolchains (add/check/bump).
Update normalise-filter.jq for receipt-driven normalization. Add pr.md
to .gitignore.
Move integration test expected outputs from flat files in programs/ to
per-nightly directories under expected/nightly-2025-03-01/, enabling
the test harness to select the correct golden files for the active
toolchain.
@cds-amal cds-amal force-pushed the stack/pr3-golden-infra branch from f7b4f55 to 273f002 Compare March 13, 2026 01:13
…ructure

Add an awk-based directive parser (parse_test_directives.awk) that
extracts test metadata (editions, compile-flags, skip conditions) from
rustc UI test source files. This replaces shell-level heuristics with
a single-pass parser that handles:
  - //@ directives (edition, compile-flags, needs-*, ignore-*)
  - Architecture and subprocess filtering
  - Range-based nightly gating via override TSV files

Rewrite run_ui_tests.sh and remake_ui_tests.sh to use the shared
parser. Add diff_test_lists.sh for generating per-nightly effective
test lists with caching. Include unit tests and boundary notes.

Per-nightly override TSV files allow fine-grained control over which
tests pass/fail on each nightly without modifying the base lists.
@cds-amal cds-amal force-pushed the stack/pr4-ui-testing branch from d13182a to eaadea6 Compare March 13, 2026 01:19
@cds-amal cds-amal force-pushed the stack/pr3-golden-infra branch from 273f002 to 8754b7e Compare March 22, 2026 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant