Skip to content
This repository was archived by the owner on Dec 31, 2025. It is now read-only.

CosilicoAI/cosilico-microdata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cosilico Microdata

Next-generation hierarchical probabilistic microdata for tax-benefit microsimulation.

Overview

Cosilico Microdata is a framework for generating and manipulating synthetic microdata that:

  • Hierarchical: Supports nested entity structures (Person → TaxUnit → Household) with variable cardinality
  • Temporal: Models panel dynamics across years and intrayear income volatility
  • Geographic: Fine-grained geographic resolution from nation down to ZCTA/tract
  • Probabilistic: Full uncertainty quantification via Bayesian inference
  • Updatable: Sequential Monte Carlo for real-time updates from economic signals
  • Linked to Law: Variables tied to statutory definitions via legal references

Installation

pip install cosilico-microdata

For development:

git clone https://github.com/CosilicoAI/cosilico-microdata.git
cd cosilico-microdata
uv pip install -e ".[dev]"

Quick Start

from cosilico_microdata.core import Person, TaxUnit, Household, Period, Geography

# Create entities
person = Person(
    id="p_001",
    age=35,
    employment_income=75000.0,
    tax_unit_id="tu_001",
    household_id="hh_001",
)

tax_unit = TaxUnit(
    id="tu_001",
    member_ids=["p_001"],
    filing_status=FilingStatus.SINGLE,
)

household = Household(
    id="hh_001",
    member_ids=["p_001"],
    state_fips="06",
    congressional_district="CA-12",
    weight=1250.5,
)

# Work with periods
year = Period.year(2024)
month = Period.month(2024, 6)
assert year.contains(month)

# Work with geography
ca = Geography.state("06")
cd12 = Geography.congressional_district("CA", 12)
assert ca.contains(cd12)

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    CALIBRATION LAYER                            │
│  Target Database → Bayesian Calibrator → Posterior Weights      │
└─────────────────────────────────────────────────────────────────┘
                                 ▼
┌─────────────────────────────────────────────────────────────────┐
│                    GENERATIVE CORE                              │
│  Hierarchical Normalizing Flow + Temporal State-Space Model     │
│  Geography → Household → Person → Records                       │
└─────────────────────────────────────────────────────────────────┘
                                 ▼
┌─────────────────────────────────────────────────────────────────┐
│                    VARIABLE ONTOLOGY                            │
│  Cosilico Schema (USC/CFR refs) ↔ Microdata Variables           │
└─────────────────────────────────────────────────────────────────┘

Key Components

Core (cosilico_microdata.core)

  • Entities: Person, TaxUnit, Household, Family, SPMUnit, Record
  • Periods: Period, PeriodType - temporal handling with arithmetic
  • Geography: Geography, GeographyLevel - hierarchical geography with crosswalks
  • Variables: Variable, VariableRegistry - ontology with legal references

Generative (cosilico_microdata.generative)

  • Hierarchical normalizing flows for multi-entity generation
  • Temporal state-space models for panel dynamics

Calibration (cosilico_microdata.calibration)

  • Target database with uncertainty
  • Entropy balancing + L0 regularization
  • Sequential Monte Carlo for updates

Inference (cosilico_microdata.inference)

  • Amortized inference for arbitrary conditional queries
  • Posterior uncertainty quantification

Development

# Run tests
pytest

# Type check
mypy src

# Lint
ruff check src tests

License

MIT

About

Next-generation hierarchical probabilistic microdata for tax-benefit microsimulation

Resources

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

 
 
 

Contributors

Languages