Data sources for microplex microsimulation: microdata, calibration targets, and country-specific pipelines.
This repository provides:
- Microdata: Survey and administrative data (CPS, PUF, FRS)
- Targets: Calibration targets from authoritative sources (IRS SOI, Census, SSA)
- Country Pipelines: Country-specific microplex builders (US districts, UK regions)
Uses microplex for synthesis and calibration algorithms.
microplex-sources/
├── micro/ # Country-specific microdata pipelines
│ ├── us/ # United States
│ │ ├── census/ # CPS download and processing
│ │ ├── district.py # US district microplex builder
│ │ ├── tax_unit_builder.py # Tax unit construction
│ │ └── synthesis/ # US-specific synthesis
│ └── uk/ # United Kingdom (planned)
├── db/ # Targets database and ETL
│ ├── schema.py # SQLModel: Target, Stratum, StratumConstraint
│ ├── etl_soi.py # IRS SOI loader
│ ├── etl_snap.py # SNAP loader
│ ├── etl_census.py # Census loader
│ └── etl_*.py # All ETL pipelines
├── calibration/ # Calibration infrastructure
│ ├── targets.py # TargetSpec, get_targets()
│ └── loader.py # Constraint matrix builder
├── macro/ # Aggregate targets
│ └── targets.db # SQLite (dev); Supabase in prod
└── data/ # Cached data files
pip install microplex-sources
# Or for development:
git clone https://github.com/CosilicoAI/microplex-sources
cd microplex-sources
pip install -e ".[dev]"python micro/us/census/download_cps.py --year 2024from micro.us.district import DistrictMicroplex, build_targets_from_db
from calibration.targets import get_targets
# Load targets from database
targets = get_targets(jurisdiction="us", year=2021)
# Build district microplex
dm = DistrictMicroplex(n_per_district=1000, target_sparsity=0.9)
result = dm.build(
seed_data=cps_data,
districts=["06", "36", "48"], # CA, NY, TX
targets=targets,
)Three-table schema for calibration targets:
- strata: Population subgroups (e.g., "CA filers with AGI $50k-$75k")
- stratum_constraints: Rules defining each stratum
- targets: Administrative totals linked to strata
from calibration.targets import get_targets
# Query targets
targets = get_targets(
jurisdiction="us",
year=2021,
sources=["irs-soi", "census"],
)| Source | Variables | Description |
|---|---|---|
| US CPS ASEC | 78 | Census household survey (income, benefits, demographics) |
| US IRS PUF | 33 | Tax return sample (income, deductions, credits) |
| UK FRS | 29 | DWP household survey (income, benefits, housing) |
| Source | Coverage | Description |
|---|---|---|
| IRS SOI | National + state + AGI brackets | Tax return aggregates |
| Census | Demographics, poverty | Population statistics |
| SSA | OASDI, SSI | Social Security data |
| SNAP | State-level | Food assistance |
| Medicaid | State-level | Health coverage |
- microplex - Core synthesis and calibration algorithms
- cosilico-us - US statute encodings
- Microdata: Add processing code in
micro/<country>/ - Targets: Add ETL script in
db/etl_<source>.py - Include official documentation URLs
- Add tests in
tests/