Skip to content

CosilicoAI/microplex-evals

Repository files navigation

microplex-evals

microplex-evals is the separate evaluation workspace for Microplex benchmark science, paper writing, artifact curation, and review handoff.

It exists so the software repos stay focused on runtime code while evaluation work stays reproducible, inspectable, and citable.

Boundary

Repo Owns Does not own
microplex-evals benchmark definitions, experiment matrices, evaluation-only scripts, repo-local artifacts, Quarto papers, review handoff docs shared runtime abstractions, production synthesis/calibration code, country adapter runtime code
microplex shared target specs, benchmark math, result/suite types, reweighting interfaces, shared artifact validation country-specific PE execution, raw source parsing, paper/reporting workflows
microplex-us US sources, PE-US execution/materialization, US-local pipelines, saved US runtime bundles cross-country benchmark math, paper/reporting orchestration
microplex-uk UK sources, PE-UK execution/materialization, UK-local pipelines cross-country benchmark math, paper/reporting orchestration

Current scope

  • Headline benchmark mission: improve downstream microplex-us performance on real target surfaces, with eventual PE-native loss as the primary ranking signal.
  • First-class evaluation outcome: coverage and support across microdata types, not just scalar loss wins.
  • Current paper workspace: papers/imputation-benchmark-2026-03/
  • Current benchmark config: benchmarks/us/imputation/imputation-benchmark-2026-03.toml
  • Clean method benchmark config: benchmarks/us/imputation/method-benchmark-2026-03.toml
  • Method benchmark review handoff: reviews/us-method-benchmark-2026-03.md
  • Zero-head benchmark config: benchmarks/us/imputation/zi-head-benchmark-2026-03.toml
  • Zero-head benchmark review handoff: reviews/us-zi-head-benchmark-2026-03.md

Local sibling checkouts

This repo is wired to sibling working copies through uv:

  • ../microplex
  • ../microplex-us

That keeps the eval repo lightweight while still letting scripts depend on the current local runtime code.

Quickstart

cd /Users/maxghenis/CosilicoAI/microplex-evals
uv sync
./scripts/render_us_imputation_paper
./scripts/plan_us_method_benchmark
./scripts/run_us_method_benchmark --methods bootstrap_independent qrf zi_qrf
./scripts/plan_us_zi_head_benchmark
./scripts/run_us_zi_head_benchmark

That command chain will:

  1. sync the configured saved results from sibling repos into repo-local artifacts
  2. build derived CSV/JSON tables and paper assets
  3. render the Quarto paper to papers/imputation-benchmark-2026-03/rendered/index.html
  4. write a durable plan for the clean US method benchmark
  5. run the clean method benchmark adapters against survey-specific holdout-conditioned draws

Layout

microplex-evals/
├── benchmarks/
├── papers/
├── artifacts/
├── scripts/
├── reviews/
├── AGENTS.md
├── _WORKSPACE.md
└── _BUILD_LOG.md

Artifact discipline

  • Paper claims should cite repo-local saved artifacts under artifacts/.
  • Scratch diagnostics can seed analysis, but they are not headline evidence unless the benchmark config explicitly allows it.
  • If a result needs shared benchmark semantics or shared manifest validation, that logic belongs in microplex, not here.

Method Benchmark Rewrite

The earlier copied legacy benchmark implementation was intentionally deleted. Git history is the preservation layer for that code. The repo now carries a clean from-scratch US method-benchmark scaffold built around explicit configs, artifact outputs, survey-specific holdout conditioning, and cleanly reimplemented benchmark adapters.

About

No description, website, or topics provided.

Resources

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages