Notation
Let the labeled sample be \((X_1, Y_1), \ldots, (X_n, Y_n)\) and the unlabeled sample be \(\widetilde X_1, \ldots, \widetilde X_N\). Here \(f(X)\) is the score we choose to plug into AIPW.
Theory
Prediction scores can improve precision in semisupervised mean inference when their scale is handled carefully. AIPW averages the score over labeled and unlabeled covariates, then adds a residual correction from the labeled outcomes. Calibration improves the score before this step by putting it closer to the outcome scale.
Here score means the model output used as \(f(X)\).
It may be raw, calibrated, or otherwise learned from the labeled
data.
Estimator
One part averages the score. The other corrects it on the rows where outcomes are observed.
estimate = pooled score average + labeled correction
The score uses all the covariate information you have. The correction term measures its average error on labeled rows and adds that error back.
Let the labeled sample be \((X_1, Y_1), \ldots, (X_n, Y_n)\) and the unlabeled sample be \(\widetilde X_1, \ldots, \widetilde X_N\). Here \(f(X)\) is the score we choose to plug into AIPW.
A simple semisupervised AIPW estimator for the mean can be written as:
Many familiar semisupervised estimators are special cases of this same template.
The unlabeled sample contributes covariate information. The labeled sample determines the residual correction.
The pooled score average uses all covariates; the correction term uses labeled residuals.
AIPW Class
The wrapper stays the same. What changes is the score inside it.
The score \(f(X)\) can be a raw score, a linear calibration map, a monotone calibration map, or another data-adaptive score. AIPW supplies the correction template around it.
Scores closer to the outcome regression \(\mu_0(X)=E_0[Y \mid X]\) make the residuals \(Y-f(X)\) smaller on average. The correction then adds less noise.
Efficiency Target
In large samples, smaller \(E_0[(Y-f(X))^2]\) means a lower-variance AIPW estimator.
PPI and PPI++
These are named choices inside the broader AIPW family.
With the raw score \(m(X)\), standard AIPW uses the pooled plug-in average:
It uses the same correction term as PPI, but keeps the labeled covariates in the plug-in average.
Plain PPI uses the unlabeled-only plug-in form:
Within the common AIPW family, this is equivalent to using the rescaled score \(f(X)=m(X)/(1-\rho)\), where \(\rho=n/(n+N)\) is the labeled fraction. That rescaling can push the score away from the right outcome scale.
PPI++ is AIPW with empirical efficiency maximization over the one-score scaling class.
In this one-score setting, PPI++ is asymptotically equivalent to linear calibration, even though the finite-sample estimators are not identical.
Empirical efficiency maximization chooses \(f\) from a candidate class by minimizing an estimated variance for the final estimator. PPI++ is the one-dimensional case \(f(X)=\lambda m(X)\).
Same template, different score choices.
Standard AIPW uses the raw score. PPI uses the unlabeled-only plug-in form. PPI++ searches over scaled versions of the score.
Calibration
Ranking helps, but the scale matters too because we average the score.
A score can rank units well while remaining miscalibrated as a numerical predictor. Calibration estimates a map from score values to outcome values.
If the calibrated score satisfies \(E_0[Y-f(X)]=0\), then its population average equals the target mean.
When calibration is approximate or evaluated out of sample, the residual correction accounts for remaining average error.
Mean Calibration and the Plug-in Mean
Write the target mean as \(\theta_0 = E_0[Y]\). If the calibrated score satisfies
then
Thus the pooled plug-in mean has the correct population target. If the sample calibration is exact, the residual correction is zero on that sample; otherwise AIPW keeps the correction term.
Calibration changes the score that the estimator averages.
Mean calibration makes the plug-in and AIPW views agree on the target.
Flexible Learning
More flexible scores can help, but the labeled sample limits how much complexity is useful.
Flexible methods can estimate \(f(X)\) when the labeled sample supports them. The gain comes from fitting the outcome regression more closely.
In small labeled samples, highly flexible fitting can fit the regression better but add variance. That tradeoff can worsen finite-sample performance even when the method is valid.
With highly flexible labeled-sample fitting, cross-fitting helps avoid using the same labels to learn and evaluate the score.
References
This page is a simplified overview. The formal theory and full citations are in the paper, but the references below are the key landmarks for the ideas summarized here.