Theory

How the estimator works in simple terms.

In the small-labeled, large-unlabeled setting, a good score can improve precision, but it should not be trusted blindly. AIPW averages that score and then adds a labeled-sample correction, so you get efficiency together with a built-in safeguard against score error.

Here score means the model output you plug into the estimator. It can still be useful before calibration puts it on the right outcome scale.

Estimator

The estimator in one line

One part averages the score. The other corrects it on the rows where outcomes are observed.

estimate = pooled score average + labeled correction

The score uses all the covariate information you have. The correction term measures its average error on labeled rows and adds that error back.

Notation

Let the labeled sample be \((X_1, Y_1), \ldots, (X_n, Y_n)\) and the unlabeled sample be \(\widetilde X_1, \ldots, \widetilde X_N\). Here \(f(X)\) is the score we choose to plug into AIPW.

AIPW for a chosen score

A simple semisupervised AIPW estimator for the mean can be written as:

\[ \hat\theta(f) = \frac{1}{n+N} \left\{ \sum_{i=1}^{n} f(X_i) + \sum_{j=1}^{N} f(\widetilde X_j) \right\} + \frac{1}{n}\sum_{i=1}^{n}\{Y_i-f(X_i)\}. \]

Why this form matters

Many familiar semisupervised estimators are special cases of this same template.

Schematic 1

Pooled score average plus labeled residual correction

The unlabeled sample contributes scale and precision. The labeled sample tells you how much to correct.

Large unlabeled sample Adds to the pooled average of f(X) Small labeled sample Adds to the pooled average and the residual correction Final estimate pooled score average plus labeled correction uses covariate information broadly repairs systematic score error

A good score helps the pooled plug-in term. The correction term keeps the estimate tied to observed outcomes.

AIPW Class

AIPW is a class, not one estimator

The wrapper stays the same. What changes is the score inside it.

You choose the score

The score \(f(X)\) can be a raw score, a linear recalibration, a monotone calibration map, or another data-adaptive score. AIPW supplies the correction template around it.

Better scores improve efficiency

If the score is closer to the true outcome regression \(\mu_0(X)=E_0[Y \mid X]\), then the residual correction is smaller and the final estimator is typically more efficient.

Efficiency Target

Up to a centering constant, the large-sample variance is minimized by choosing \(f\) to make \(E_0[(Y-f(X))^2]\) small.

PPI and PPI++

Where PPI and PPI++ fit

These are named choices inside the broader AIPW family.

Standard AIPW

With the raw score \(m(X)\), standard AIPW uses the pooled plug-in average:

\[ \hat\theta_{\mathrm{AIPW}} = \frac{1}{n+N} \left\{ \sum_{i=1}^{n} m(X_i) + \sum_{j=1}^{N} m(\widetilde X_j) \right\} + \frac{1}{n}\sum_{i=1}^{n}\{Y_i-m(X_i)\}. \]

It uses the same correction term as PPI, but keeps the labeled covariates in the plug-in average.

PPI

Plain PPI uses the unlabeled-only plug-in form:

\[ \hat\theta_{\mathrm{PPI}} = \frac{1}{N}\sum_{j=1}^{N} m(\widetilde X_j) + \frac{1}{n}\sum_{i=1}^{n}\{Y_i-m(X_i)\}. \]

Within the common AIPW family, this is equivalent to using the rescaled score \(f(X)=m(X)/(1-\rho)\), where \(\rho=n/(n+N)\) is the labeled fraction. That rescaling can push the score away from the right outcome scale.

PPI++

PPI++ is AIPW with empirical efficiency maximization over the one-score scaling class.

\[ \hat\theta_{\mathrm{PPI++}} = \hat\theta(\hat\lambda\,m). \]

In this one-score setting, PPI++ is asymptotically equivalent to linear calibration, even though the finite-sample estimators are not identical.

Empirical efficiency maximization

This is the broader principle: choose the score inside AIPW to make the estimator as efficient as possible. PPI++ is the one-score special case that only searches over scaled versions of \(m(X)\).

Schematic 2

One AIPW template, several score choices

Same template, different score choices.

AIPW template choose a score, then add correction Raw score f(X) = m(X) Standard AIPW baseline Scaled score f(X) = λ m(X) PPI++ special case Linear calibration f(X) = a + b m(X) Affine recalibration Monotone calibration f(X) = g(m(X)) Flexible recalibration

Standard AIPW uses the raw score. PPI is the inefficient unlabeled-only variant. PPI++ searches over scaled versions of that score.

Calibration

Why calibration helps

Ranking helps, but the scale matters too because we average the score.

Calibration improves the score

A score can rank well and still be on the wrong scale. Calibration makes it more faithful to the outcome scale.

Mean calibration validates the plug-in view

If the calibrated score has the right average on labeled rows, then averaging it targets the right mean.

AIPW keeps the safety layer

Calibration improves the score, and the labeled residual correction guards against remaining error.

Why Mean Calibration Is Enough

Write the target mean as \(\theta_0 = E_0[Y]\). If the calibrated score satisfies

\[ E_0[Y-f(X)] = 0. \]

then

\[ E_0[f(X)] = E_0[Y] = \theta_0. \]

So the pooled plug-in mean targets the right estimand. For many calibrated methods, the extra correction term \(\frac{1}{n}\sum_{i=1}^{n}\{Y_i-f(X_i)\}\) is exactly zero, or numerically very close to zero, on the sample used to fit the calibration. In those cases the plug-in and AIPW forms are effectively the same. When calibration is only approximate or fitting and evaluation are separated, the correction need not vanish and remains a safety layer.

Schematic 3

From a mis-scaled score to a better score

Calibration changes the score that the estimator averages.

Before calibration good ranking, wrong scale raw score ideal fit After calibration closer to the outcome scale calibrated score ideal fit Better efficiency mean-calibrated score

Once the score is mean-calibrated, the plug-in and AIPW views agree.

Flexible Learning

Flexible learning is allowed, but not free

Better prediction can help, but small labeled samples still limit how much complexity is worth using.

Use flexibility when it helps

The score \(f(X)\) can be learned with modern machine learning if that gives a better approximation to \(\mu_0(X)\). Better approximation usually means better efficiency.

There is still a bias-variance tradeoff

In small labeled samples, aggressive machine learning can reduce bias but increase variance. That can lead to worse finite-sample performance even when the method is valid.

Cross-fitting helps with validity

If the labeled-sample fitting is very aggressive, cross-fitting avoids overusing the same labels and is the usual safeguard for inference.

Practical rule: start simple when the labeled sample is small, and add flexibility only when it clearly improves prediction.

Takeaway

What this means in practice

Think of the estimator as a safe wrapper around a score you are trying to improve.

Practical takeaway

  • AIPW is the safe wrapper.
  • \(f(X)\) is the efficiency lever.
  • Calibration is one way to improve that lever.
  • PPI and PPI++ are specific choices inside this broader workflow.

References

Selected references

This page is a simplified overview. The formal theory and full citations are in the paper, but the references below are the key landmarks for the ideas summarized here.

  • Robins, Rotnitzky, and Zhao (1994). Augmented inverse-probability weighting for missing-data regression problems.
  • Rubin and van der Laan (2008). Empirical efficiency maximization for locally efficient covariate adjustment.
  • van der Laan and Robins (2003). Unified Methods for Censored Longitudinal Data and Causality.
  • van der Laan and Rubin (2006). Targeted maximum likelihood and the broader debiased / semiparametric viewpoint.
  • van der Laan and Rose (2011). Targeted Learning: Causal Inference for Observational and Experimental Data.
  • Zheng and van der Laan (2010). Cross-fitted targeted learning as a route to valid inference with flexible nuisance fitting.
  • Angelopoulos, Bates, Fannjiang, Jordan, and Zrnic (2023). Prediction-powered inference.
  • Angelopoulos, Duchi, and Zrnic (2023). PPI++.