microplex-evals

microplex-evals is the separate evaluation workspace for Microplex benchmark science, paper writing, artifact curation, and review handoff.

It exists so the software repos stay focused on runtime code while evaluation work stays reproducible, inspectable, and citable.

Boundary

Repo	Owns	Does not own
`microplex-evals`	benchmark definitions, experiment matrices, evaluation-only scripts, repo-local artifacts, Quarto papers, review handoff docs	shared runtime abstractions, production synthesis/calibration code, country adapter runtime code
`microplex`	shared target specs, benchmark math, result/suite types, reweighting interfaces, shared artifact validation	country-specific PE execution, raw source parsing, paper/reporting workflows
`microplex-us`	US sources, PE-US execution/materialization, US-local pipelines, saved US runtime bundles	cross-country benchmark math, paper/reporting orchestration
`microplex-uk`	UK sources, PE-UK execution/materialization, UK-local pipelines	cross-country benchmark math, paper/reporting orchestration

Current scope

Headline benchmark mission: improve downstream microplex-us performance on real target surfaces, with eventual PE-native loss as the primary ranking signal.
First-class evaluation outcome: coverage and support across microdata types, not just scalar loss wins.
Current paper workspace: papers/imputation-benchmark-2026-03/
Current benchmark config: benchmarks/us/imputation/imputation-benchmark-2026-03.toml
Clean method benchmark config: benchmarks/us/imputation/method-benchmark-2026-03.toml
Method benchmark review handoff: reviews/us-method-benchmark-2026-03.md
Zero-head benchmark config: benchmarks/us/imputation/zi-head-benchmark-2026-03.toml
Zero-head benchmark review handoff: reviews/us-zi-head-benchmark-2026-03.md

Local sibling checkouts

This repo is wired to sibling working copies through uv:

../microplex
../microplex-us

That keeps the eval repo lightweight while still letting scripts depend on the current local runtime code.

Quickstart

cd /Users/maxghenis/CosilicoAI/microplex-evals
uv sync
./scripts/render_us_imputation_paper
./scripts/plan_us_method_benchmark
./scripts/run_us_method_benchmark --methods bootstrap_independent qrf zi_qrf
./scripts/plan_us_zi_head_benchmark
./scripts/run_us_zi_head_benchmark

That command chain will:

sync the configured saved results from sibling repos into repo-local artifacts
build derived CSV/JSON tables and paper assets
render the Quarto paper to papers/imputation-benchmark-2026-03/rendered/index.html
write a durable plan for the clean US method benchmark
run the clean method benchmark adapters against survey-specific holdout-conditioned draws

Layout

microplex-evals/
├── benchmarks/
├── papers/
├── artifacts/
├── scripts/
├── reviews/
├── AGENTS.md
├── _WORKSPACE.md
└── _BUILD_LOG.md

Artifact discipline

Paper claims should cite repo-local saved artifacts under artifacts/.
Scratch diagnostics can seed analysis, but they are not headline evidence unless the benchmark config explicitly allows it.
If a result needs shared benchmark semantics or shared manifest validation, that logic belongs in microplex, not here.

Method Benchmark Rewrite

The earlier copied legacy benchmark implementation was intentionally deleted. Git history is the preservation layer for that code. The repo now carries a clean from-scratch US method-benchmark scaffold built around explicit configs, artifact outputs, survey-specific holdout conditioning, and cleanly reimplemented benchmark adapters.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
artifacts		artifacts
benchmarks/us/imputation		benchmarks/us/imputation
papers/imputation-benchmark-2026-03		papers/imputation-benchmark-2026-03
reviews		reviews
scripts		scripts
src/microplex_evals		src/microplex_evals
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
_BUILD_LOG.md		_BUILD_LOG.md
_WORKSPACE.md		_WORKSPACE.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

microplex-evals

Boundary

Current scope

Local sibling checkouts

Quickstart

Layout

Artifact discipline

Method Benchmark Rewrite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

microplex-evals

Boundary

Current scope

Local sibling checkouts

Quickstart

Layout

Artifact discipline

Method Benchmark Rewrite

About

Resources

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages