PyPI install
python -m pip install ppi-aipw
Package overview
ppi_aipw implements methods for semisupervised mean
inference with few labeled outcomes and many unlabeled
predictions. It combines AIPW, calibration, and uncertainty
quantification in one API.
Calibration puts the prediction score on the outcome scale before the semisupervised mean step.
Install
Use PyPI for the release, GitHub for the current source, or an editable install for local development. The native R package has a separate R page.
python -m pip install ppi-aipw
python -m pip install "git+https://github.com/Larsvanderlaan/ppi-aipw.git"
python -m pip install -e .
Quickstart
mean_inference(...) is the main entry point. It
returns the point estimate, standard error, confidence interval,
fitted calibrator, and diagnostics.
For a runnable example, the quickstart notebook opens directly in Colab and covers both the mean and causal APIs.
For data-adaptive selection, set method="auto" and
pass candidate_methods=("aipw", "linear", "monotone_spline", "isotonic").
Selection uses num_folds=100 by default, capped at
the labeled sample size.
Use result.summary() for a compact Wald summary.
Use calibration_diagnostics(result, Y, Yhat) for
an optional out-of-fold calibration check.
mean_inference(...) returns a result object with
pointestimate, se, ci,
diagnostics, and result.summary().
YObserved outcomes for the labeled sample.
YhatPredictions on the same labeled rows.
Yhat_unlabeledPredictions on the unlabeled sample.
methodChoose "aipw", "linear", "prognostic_linear", "sigmoid", "monotone_spline", "isotonic", or "auto".
candidate_methodsCandidate methods considered when method="auto" minimizes a cross-validated variance estimate. If "aipw" is included, the selector also compares a rescaled AIPW candidate.
num_foldsNumber of folds used by method="auto". The default is 100 and it is capped at the labeled sample size.
inferenceChoose "wald" for a fast analytic interval, "jackknife" for a fold-resampling normal interval, or "bootstrap" for percentile bootstrap intervals.
efficiency_maximizationOptional rescaling to lambda m(X). For method="aipw", m(X) is the raw score; otherwise it is the calibrated score.
w, w_unlabeledOptional observation weights for labeled and unlabeled samples. Uniform weights reproduce the unweighted estimator.
X, X_unlabeledOptional extra covariates for method="prognostic_linear". The score and intercept are unpenalized; extra covariates use ridge tuning on the labeled sample.
Calibration
Calibration is about getting the prediction scale right, not just the ranking.
A calibrated score has the right numeric scale: examples scored
near 0.8 have outcomes near 0.8 on
average. That scale correction can make a useful predictor more
accurate without retraining the original model.
AIPW averages the score and then corrects it on labeled rows. When calibration moves the score closer to the outcome regression, the correction term is smaller on average, which can improve precision.
Method Explorer
Method summaries, typical use cases, and main tradeoffs.
Fits a smooth monotone spline calibration curve, then plugs the calibrated predictions into the AIPW estimator.
Schematic view of how the raw prediction score
m(X) is transformed before the semisupervised
mean step.
Intervals
Jackknife and bootstrap both refit the calibration step under resampling; jackknife uses delete-a-group folds, while bootstrap uses classical resampling with replacement.
Fast analytic intervals.
References
The calibration methods here can be viewed as special cases of calibrated debiased machine learning and targeted minimum loss estimation.
The AIPW baseline cited above goes back to Robins, Rotnitzky, and Zhao (1994), "Estimation of regression coefficients when some regressors are not always observed", Journal of the American Statistical Association 89(427): 846-866.
Main paper themes reflected here:
Calibration references:
Semiparametric, debiased/targeted machine learning foundations:
Prognostic-score adjustment and efficiency maximization:
Semisupervised mean estimation: