Choosing Losses and Methods

What population is being calibrated?

loss="dr"

loss="dr" calibrates to the original study population.

However, the DR-loss relies on inverse-propensity weighting. That makes it more sensitive when overlap is weak or propensities get close to 0 or 1.

  • target: original / observed population
  • strength: direct population-level interpretation
  • caveat: inverse-propensity weighting can become unstable under poor overlap

This aligns with the doubly robust treatment-effect construction used in the causal calibration paper and DR-learner literature, including van der Laan et al. (2023) and Kennedy (2020).

loss="r"

loss="r" calibrates using the R-loss weighting induced by (A - e(W))^2. That means the calibration problem places more emphasis on observations in the overlap or equipoise region.

This is often more robust when overlap is weak, but it changes the target: the calibrated scores apply to an overlap-weighted population rather than the original study population.

  • target: overlap-weighted population
  • strength: often more robust under weak overlap
  • caveat: the target population is no longer the original observed population

This interpretation follows the R-learner weighting structure of Nie and Wager (2021). In the documentation, treat this as a consequence of the weighting scheme, not as a package-specific redefinition of calibration.

Short recommendation

Default package recommendation: start with dr when original-population targeting is the goal and overlap looks adequate. Move to r when the package’s overlap screen flags weak overlap and an overlap-weighted target is acceptable.

Choosing a calibration method

isotonic

Best default when you want a monotone nonparametric calibration map with minimal assumptions.

In causalCalibration, isotonic is implemented with a single monotone LightGBM regression tree. That gives you:

  • monotone weighted fitting,
  • a practical min_child_samples control,
  • and flat extrapolation beyond the observed score range.

monotone_spline

Use when you want a smooth monotone map rather than a piecewise-constant one.

This method uses a monotone spline fit with nonnegative derivative coefficients and a smoothness penalty.

linear

Use when you want a simple parametric recalibration step and easy interpretation.

histogram

Use when you want a coarse, transparent piecewise-constant map or a simple diagnostic baseline.

Choosing standard calibration vs cross-calibration

Prefer standard calibration when

  • you already have a clean calibration sample,
  • or you only have one prediction per unit.

Prefer cross-calibration when

  • your learner was trained with cross-fitting,
  • you have pooled OOF predictions and fold-specific predictions,
  • and you want to fit and calibrate in sample without a separate holdout calibration split.

Diagnostics and overlap

If overlap is weak, diagnostics should be interpreted together with the loss choice:

  • a dr diagnostic targets the original population but may be noisier because of inverse-propensity behavior,
  • an r fit may be more stable in overlap-poor regions, but its calibrated scores answer a different target-population question,
  • assess_overlap() gives a quick severity label plus a default loss recommendation before you fit anything.

The assess_overlap() severity labels use package-default screening rules based on propensity tails, clipping, and IPW effective sample size. Treat them as workflow defaults rather than universal cutoffs.