Next-generation hierarchical probabilistic microdata for tax-benefit microsimulation.
Cosilico Microdata is a framework for generating and manipulating synthetic microdata that:
- Hierarchical: Supports nested entity structures (Person → TaxUnit → Household) with variable cardinality
- Temporal: Models panel dynamics across years and intrayear income volatility
- Geographic: Fine-grained geographic resolution from nation down to ZCTA/tract
- Probabilistic: Full uncertainty quantification via Bayesian inference
- Updatable: Sequential Monte Carlo for real-time updates from economic signals
- Linked to Law: Variables tied to statutory definitions via legal references
pip install cosilico-microdataFor development:
git clone https://github.com/CosilicoAI/cosilico-microdata.git
cd cosilico-microdata
uv pip install -e ".[dev]"from cosilico_microdata.core import Person, TaxUnit, Household, Period, Geography
# Create entities
person = Person(
id="p_001",
age=35,
employment_income=75000.0,
tax_unit_id="tu_001",
household_id="hh_001",
)
tax_unit = TaxUnit(
id="tu_001",
member_ids=["p_001"],
filing_status=FilingStatus.SINGLE,
)
household = Household(
id="hh_001",
member_ids=["p_001"],
state_fips="06",
congressional_district="CA-12",
weight=1250.5,
)
# Work with periods
year = Period.year(2024)
month = Period.month(2024, 6)
assert year.contains(month)
# Work with geography
ca = Geography.state("06")
cd12 = Geography.congressional_district("CA", 12)
assert ca.contains(cd12)┌─────────────────────────────────────────────────────────────────┐
│ CALIBRATION LAYER │
│ Target Database → Bayesian Calibrator → Posterior Weights │
└─────────────────────────────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ GENERATIVE CORE │
│ Hierarchical Normalizing Flow + Temporal State-Space Model │
│ Geography → Household → Person → Records │
└─────────────────────────────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ VARIABLE ONTOLOGY │
│ Cosilico Schema (USC/CFR refs) ↔ Microdata Variables │
└─────────────────────────────────────────────────────────────────┘
- Entities:
Person,TaxUnit,Household,Family,SPMUnit,Record - Periods:
Period,PeriodType- temporal handling with arithmetic - Geography:
Geography,GeographyLevel- hierarchical geography with crosswalks - Variables:
Variable,VariableRegistry- ontology with legal references
- Hierarchical normalizing flows for multi-entity generation
- Temporal state-space models for panel dynamics
- Target database with uncertainty
- Entropy balancing + L0 regularization
- Sequential Monte Carlo for updates
- Amortized inference for arbitrary conditional queries
- Posterior uncertainty quantification
# Run tests
pytest
# Type check
mypy src
# Lint
ruff check src testsMIT