PolicyEngine · juaristi22 · Mar 30, 2026 · Mar 30, 2026 · Mar 30, 2026 · Mar 30, 2026
diff --git a/README.md b/README.md
@@ -7,7 +7,7 @@ Microimpute is a Python package for imputing variables from one survey dataset o
 - **Statistical Matching**: distance-based matching to find similar donor observations
 - **Ordinary Least Squares (OLS)**: linear regression imputation
 - **Quantile Regression**: models conditional quantiles instead of the conditional mean
-- **Quantile Random Forests (QRF)**: non-parametric, tree-based quantile estimation
+- **Quantile Regression Forests (QRF)**: non-parametric, tree-based quantile estimation
 - **Mixture Density Networks (MDN)**: neural network with a Gaussian mixture output
 
 ## Autoimpute

diff --git a/docs/index.md b/docs/index.md
@@ -5,7 +5,7 @@ Microimpute is a Python package for imputing variables from one survey dataset o
 The package currently supports:
 - Hot Deck Matching
 - Ordinary Least Squares (OLS) Linear Regression
-- Quantile Random Forests (QRF)
+- Quantile Regression Forests (QRF)
 - Quantile Regression
 - Mixture Density Networks (MDN)
 

diff --git a/docs/models/qrf/index.md b/docs/models/qrf/index.md
@@ -1,4 +1,4 @@
-# Quantile Random Forests
+# Quantile Regression Forests
 
 The `QRF` model uses an ensemble of decision trees to predict different quantiles of the target variable distribution. This allows it to model non-linear relationships while estimating uncertainty across the conditional distribution.
 
@@ -8,7 +8,7 @@ QRF handles both numerical and categorical variables. For numerical targets, it
 
 ## How it works
 
-Quantile Random Forests build on standard random forests using the `quantile_forest` package. The method constructs an ensemble of decision trees, each trained on a bootstrapped sample of the data (bagging). At each split, only a random subset of features is considered, which introduces diversity among trees and reduces overfitting.
+Quantile Regression Forests build on standard random forests using the `quantile_forest` package. The method constructs an ensemble of decision trees, each trained on a bootstrapped sample of the data (bagging). At each split, only a random subset of features is considered, which introduces diversity among trees and reduces overfitting.
 
 Unlike standard random forests that aggregate predictions into averages, QRF retains the full predictive distribution from each tree and estimates quantiles directly from this empirical distribution.
 

diff --git a/docs/models/qrf/qrf-imputation.ipynb b/docs/models/qrf/qrf-imputation.ipynb
@@ -4,13 +4,13 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Quantile Random Forest (QRF) imputation\n",
+    "# Quantile Regression Forest (QRF) imputation\n",
     "\n",
-    "This notebook demonstrates how to use Microimpute's `QRF` imputer to impute values using Quantile Random Forests. QRF extends traditional random forests to predict the entire conditional distribution of a target variable.\n",
+    "This notebook demonstrates how to use Microimpute's `QRF` imputer to impute values using Quantile Regression Forests. QRF extends traditional random forests to predict the entire conditional distribution of a target variable.\n",
     "\n",
     "## Variable type support\n",
     "\n",
-    "The QRF model automatically handles both numerical and categorical variables. For numerical targets, it applies quantile random forests. For categorical targets (strings, booleans, or numerically-encoded categorical variables), it switches to using a random forest classifier. This automatic adaptation happens internally without requiring any manual configuration.\n",
+    "The QRF model automatically handles both numerical and categorical variables. For numerical targets, it applies quantile regression forests. For categorical targets (strings, booleans, or numerically-encoded categorical variables), it switches to using a random forest classifier. This automatic adaptation happens internally without requiring any manual configuration.\n",
     "\n",
     "The QRF model supports sequential imputation with a single object and workflow. Pass a list of `imputed_variables` with all variables you want to impute, and the model imputes them sequentially. This means that previously imputed variables will serve as predictors for subsequent variables, capturing complex dependencies between the imputed variables.\n",
     "\n",
@@ -584,7 +584,7 @@
     "qrf_imputer = QRF()\n",
     "\n",
     "# Fit the model with our training data\n",
-    "# This trains a quantile random forest model\n",
+    "# This trains a quantile regression forest model\n",
     "fitted_qrf_imputer = qrf_imputer.fit(\n",
     "    X_train,\n",
     "    predictors,\n",
@@ -2023,7 +2023,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This scatter plot compares actual observed values with those imputed by a Quantile Random Forest (QRF) model, providing a visual assessment of imputation accuracy. Each point represents a data record, with the x-axis showing the true value and the y-axis showing the model’s predicted value. The red dashed line represents the ideal 1:1 relationship, where predictions perfectly match actual values. Most points cluster around this line, suggesting that the QRF model effectively captures the underlying structure of the data. Importantly, the model does not appear to systematically over- or under-predict across the range, and while performance at the extremes may be weaker, the overall pattern indicates that QRF provides a reasonably accurate and unbiased approach to imputing missing values. Additionally, it is important to consider the characteristics of the diabetes dataset, which seems to show a strong linear relationship between predictors and the imputed variable. QRF's behavior suggests strength in accurately imputing variables for datasets when such linearity assumptions do not hold."
+    "This scatter plot compares actual observed values with those imputed by a Quantile Regression Forest (QRF) model, providing a visual assessment of imputation accuracy. Each point represents a data record, with the x-axis showing the true value and the y-axis showing the model’s predicted value. The red dashed line represents the ideal 1:1 relationship, where predictions perfectly match actual values. Most points cluster around this line, suggesting that the QRF model effectively captures the underlying structure of the data. Importantly, the model does not appear to systematically over- or under-predict across the range, and while performance at the extremes may be weaker, the overall pattern indicates that QRF provides a reasonably accurate and unbiased approach to imputing missing values. Additionally, it is important to consider the characteristics of the diabetes dataset, which seems to show a strong linear relationship between predictors and the imputed variable. QRF's behavior suggests strength in accurately imputing variables for datasets when such linearity assumptions do not hold."
    ]
   },
   {
@@ -3636,7 +3636,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This plot visualizes the prediction intervals produced by the Quantile Random Forest (QRF) model for imputing total serum cholesterol values across ten data records. Each vertical bar represents an 80% (light gray) or 40% (dark gray) prediction interval, capturing the model's estimated range of plausible values based on the Q10–Q90 and Q30–Q70 quantiles, respectively. Red dots mark the model's median predictions (Q50), while black dots show the actual observed values. In most cases, the true values fall within the wider intervals, indicating that the QRF model is appropriately capturing uncertainty in its imputation. The fact that the intervals are sometimes asymmetrical around the median reflects the model’s flexibility in estimating skewed or heteroskedastic distributions. Overall, the plot demonstrates that the QRF model not only provides accurate point estimates but also yields informative prediction intervals that account for uncertainty in the imputed values."
+    "This plot visualizes the prediction intervals produced by the Quantile Regression Forest (QRF) model for imputing total serum cholesterol values across ten data records. Each vertical bar represents an 80% (light gray) or 40% (dark gray) prediction interval, capturing the model's estimated range of plausible values based on the Q10–Q90 and Q30–Q70 quantiles, respectively. Red dots mark the model's median predictions (Q50), while black dots show the actual observed values. In most cases, the true values fall within the wider intervals, indicating that the QRF model is appropriately capturing uncertainty in its imputation. The fact that the intervals are sometimes asymmetrical around the median reflects the model’s flexibility in estimating skewed or heteroskedastic distributions. Overall, the plot demonstrates that the QRF model not only provides accurate point estimates but also yields informative prediction intervals that account for uncertainty in the imputed values."
    ]
   },
   {

diff --git a/microimpute/models/__init__.py b/microimpute/models/__init__.py
@@ -6,7 +6,7 @@
 
 Available models:
     - OLS: ordinary least squares regression with bootstrapped quantiles
-    - QRF: quantile random forest for non-parametric quantile regression
+    - QRF: quantile regression forest for non-parametric quantile regression
     - QuantReg: linear quantile regression model
     - Matching: statistical matching/hot-deck imputation (optional, requires rpy2)
     - MDN: Mixture Density Network for probabilistic imputation

diff --git a/microimpute/models/mdn.py b/microimpute/models/mdn.py
@@ -268,6 +268,12 @@ def fit(
             col for col in X.columns.tolist() if col not in categorical_cols
         ]
 
+        # Cast continuous columns to float64 to avoid pandas 3.x
+        # LossySetitemError when pytorch_tabular's scaler writes
+        # normalized float values back into integer-typed columns.
+        for col in continuous_cols:
+            train_data[col] = train_data[col].astype("float64")
+
         # Configure data
         data_config = DataConfig(
             target=[y.name],
@@ -351,6 +357,12 @@ def predict(self, X: pd.DataFrame, n_samples: int = 1) -> np.ndarray:
         # Put model in eval mode
         self.model.model.eval()
 
+        # Cast continuous columns to float64 for pandas 3.x compat
+        X = X.copy()
+        for col in self.model.config.continuous_cols:
+            if col in X.columns:
+                X[col] = X[col].astype("float64")
+
         # Create inference dataloader
         test_loader = self.model.datamodule.prepare_inference_dataloader(X)
 
@@ -466,6 +478,12 @@ def fit(
             col for col in X.columns.tolist() if col not in categorical_cols
         ]
 
+        # Cast continuous columns to float64 to avoid pandas 3.x
+        # LossySetitemError when pytorch_tabular's scaler writes
+        # normalized float values back into integer-typed columns.
+        for col in continuous_cols:
+            train_data[col] = train_data[col].astype("float64")
+
         # Configure data
         data_config = DataConfig(
             target=[y.name],
@@ -541,6 +559,12 @@ def predict(
             Predicted values as Series, or dict with probabilities if
             return_probs=True.
         """
+        # Cast continuous columns to float64 for pandas 3.x compat
+        X = X.copy()
+        for col in self.model.config.continuous_cols:
+            if col in X.columns:
+                X[col] = X[col].astype("float64")
+
         # Get predictions with probabilities
         preds_df = self.model.predict(X, ret_logits=False)
 

diff --git a/microimpute/models/qrf.py b/microimpute/models/qrf.py
@@ -482,7 +482,7 @@ class QRF(Imputer):
     """
     Quantile Regression Forest model for imputation.
 
-    This model uses a Quantile Random Forest to predict quantiles.
+    This model uses a Quantile Regression Forest to predict quantiles.
     The underlying QRF implementation is from the quantile_forest package.
     """
 

diff --git a/paper/bibliography/references.bib b/paper/bibliography/references.bib
@@ -17,14 +17,14 @@ @article{bishop1994mixture
   year      = {1994}
 }
 
-@incollection{bourguignon2006microsimulation,
-  title     = {Microsimulation as a tool for evaluating redistribution policies},
-  author    = {Bourguignon, Fran{\c{c}}ois and Spadaro, Amedeo},
-  booktitle = {Journal of Economic Inequality},
-  volume    = {4},
-  number    = {1},
-  pages     = {77--106},
-  year      = {2006},
+@article{bourguignon2006microsimulation,
+  title   = {Microsimulation as a tool for evaluating redistribution policies},
+  author  = {Bourguignon, Fran{\c{c}}ois and Spadaro, Amedeo},
+  journal = {Journal of Economic Inequality},
+  volume  = {4},
+  number  = {1},
+  pages   = {77--106},
+  year    = {2006},
   publisher = {Springer}
 }
 

diff --git a/paper/figures/models_dist_comparison.png b/paper/figures/models_dist_comparison.png
diff --git a/paper/figures/models_ssi_reform_comparison.png b/paper/figures/models_ssi_reform_comparison.png