Notation
Let the labeled sample be \((X_1, Y_1), \ldots, (X_n, Y_n)\) and the unlabeled sample be \(\widetilde X_1, \ldots, \widetilde X_N\). Here \(f(X)\) is the score we choose to plug into AIPW.
Theory
In the small-labeled, large-unlabeled setting, a good score can improve precision, but it should not be trusted blindly. AIPW averages that score and then adds a labeled-sample correction, so you get efficiency together with a built-in safeguard against score error.
Here score means the model output you plug into the
estimator. It can still be useful before calibration puts it on
the right outcome scale.
Estimator
One part averages the score. The other corrects it on the rows where outcomes are observed.
estimate = pooled score average + labeled correction
The score uses all the covariate information you have. The correction term measures its average error on labeled rows and adds that error back.
Let the labeled sample be \((X_1, Y_1), \ldots, (X_n, Y_n)\) and the unlabeled sample be \(\widetilde X_1, \ldots, \widetilde X_N\). Here \(f(X)\) is the score we choose to plug into AIPW.
A simple semisupervised AIPW estimator for the mean can be written as:
Many familiar semisupervised estimators are special cases of this same template.
The unlabeled sample contributes scale and precision. The labeled sample tells you how much to correct.
A good score helps the pooled plug-in term. The correction term keeps the estimate tied to observed outcomes.
AIPW Class
The wrapper stays the same. What changes is the score inside it.
The score \(f(X)\) can be a raw score, a linear recalibration, a monotone calibration map, or another data-adaptive score. AIPW supplies the correction template around it.
If the score is closer to the true outcome regression \(\mu_0(X)=E_0[Y \mid X]\), then the residual correction is smaller and the final estimator is typically more efficient.
Efficiency Target
Up to a centering constant, the large-sample variance is minimized by choosing \(f\) to make \(E_0[(Y-f(X))^2]\) small.
PPI and PPI++
These are named choices inside the broader AIPW family.
With the raw score \(m(X)\), standard AIPW uses the pooled plug-in average:
It uses the same correction term as PPI, but keeps the labeled covariates in the plug-in average.
Plain PPI uses the unlabeled-only plug-in form:
Within the common AIPW family, this is equivalent to using the rescaled score \(f(X)=m(X)/(1-\rho)\), where \(\rho=n/(n+N)\) is the labeled fraction. That rescaling can push the score away from the right outcome scale.
PPI++ is AIPW with empirical efficiency maximization over the one-score scaling class.
In this one-score setting, PPI++ is asymptotically equivalent to linear calibration, even though the finite-sample estimators are not identical.
This is the broader principle: choose the score inside AIPW to make the estimator as efficient as possible. PPI++ is the one-score special case that only searches over scaled versions of \(m(X)\).
Same template, different score choices.
Standard AIPW uses the raw score. PPI is the inefficient unlabeled-only variant. PPI++ searches over scaled versions of that score.
Calibration
Ranking helps, but the scale matters too because we average the score.
A score can rank well and still be on the wrong scale. Calibration makes it more faithful to the outcome scale.
If the calibrated score has the right average on labeled rows, then averaging it targets the right mean.
Calibration improves the score, and the labeled residual correction guards against remaining error.
Why Mean Calibration Is Enough
Write the target mean as \(\theta_0 = E_0[Y]\). If the calibrated score satisfies
then
So the pooled plug-in mean targets the right estimand. For many calibrated methods, the extra correction term \(\frac{1}{n}\sum_{i=1}^{n}\{Y_i-f(X_i)\}\) is exactly zero, or numerically very close to zero, on the sample used to fit the calibration. In those cases the plug-in and AIPW forms are effectively the same. When calibration is only approximate or fitting and evaluation are separated, the correction need not vanish and remains a safety layer.
Calibration changes the score that the estimator averages.
Once the score is mean-calibrated, the plug-in and AIPW views agree.
Flexible Learning
Better prediction can help, but small labeled samples still limit how much complexity is worth using.
The score \(f(X)\) can be learned with modern machine learning if that gives a better approximation to \(\mu_0(X)\). Better approximation usually means better efficiency.
In small labeled samples, aggressive machine learning can reduce bias but increase variance. That can lead to worse finite-sample performance even when the method is valid.
If the labeled-sample fitting is very aggressive, cross-fitting avoids overusing the same labels and is the usual safeguard for inference.
Takeaway
Think of the estimator as a safe wrapper around a score you are trying to improve.
If you want the API details, head back to the Python quickstart or the R package page. If you want the formal development, open the paper PDF.
References
This page is a simplified overview. The formal theory and full citations are in the paper, but the references below are the key landmarks for the ideas summarized here.