Donor-impute CPS demographic, occupation, and TTOC features on PUF clones#658
Open
Donor-impute CPS demographic, occupation, and TTOC features on PUF clones#658
Conversation
f0a2e5e to
c9809fd
Compare
c8502b3 to
6cb4ded
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
is_male,cps_race,is_hispanic, anddetailed_occupation_recodeusing CPS records with similar demographics, geography, and PUF-imputed incomestreasury_tipped_occupation_codeon the CPS half from raw CPSPEIOOCCvia an official Census occupation-code crosswalk plus the IRS/Treasury TTOC related-SOC listtreasury_tipped_occupation_codewhen the installedpolicyengine-usexposes that variableWhy
Right now the PUF clone half keeps donor-copied CPS demographic and occupation labels even after its income profile is replaced by PUF imputation. That weakens subgroup analysis by race/ethnicity/sex and can misclassify occupation-based logic like overtime exemptions.
This PR keeps the existing stage-2 QRF for continuous CPS-only variables, but adds a preceding donor-imputation pass for the categorical CPS features we actually use downstream. It also starts carrying Treasury tipped occupation codes on the CPS side so the rules engine can consume a law-facing occupation input instead of a CPS-specific approximation.
Notes
PEIOOCC -> TTOCmapping is an approximation layer inpolicyengine-us-data, where it belongs.policyengine-usnow treatstreasury_tipped_occupation_codeas an input and does not embed CPS/SOC crosswalk logic.policyengine-usincludingtreasury_tipped_occupation_code. That keeps this PR CI-compatible until the correspondingpolicyengine-uschange merges.Testing
uv run ruff check policyengine_us_data/datasets/cps/tipped_occupation.py policyengine_us_data/datasets/cps/extended_cps.py policyengine_us_data/tests/test_extended_cps.pyuv run pytest -q policyengine_us_data/tests/test_extended_cps.pyuv run pytest -q policyengine_us_data/tests/test_calibration/test_puf_impute.pyInteraction notes
#633also touchespolicyengine_us_data/datasets/cps/extended_cps.py, but only for structural mortgage input support. I kept this PR based onmainbecause the logic is otherwise independent; the overlap should be limited to import/generate()context when that PR merges.#631is broader pipeline restructuring and does not change the clone-feature logic added here.