Choosing Losses and Methods

What population is being calibrated?

`loss="dr"`

loss="dr" calibrates to the original study population.

However, the DR-loss relies on inverse-propensity weighting. That makes it more sensitive when overlap is weak or propensities get close to 0 or 1.

target: original / observed population
strength: direct population-level interpretation
caveat: inverse-propensity weighting can become unstable under poor overlap

This aligns with the doubly robust treatment-effect construction used in the causal calibration paper and DR-learner literature, including van der Laan et al. (2023) and Kennedy (2020).

`loss="r"`

loss="r" calibrates using the R-loss weighting induced by (A - e(W))^2. That means the calibration problem places more emphasis on observations in the overlap or equipoise region.

This is often more robust when overlap is weak, but it changes the target: the calibrated scores apply to an overlap-weighted population rather than the original study population.

target: overlap-weighted population
strength: often more robust under weak overlap
caveat: the target population is no longer the original observed population

This interpretation follows the R-learner weighting structure of Nie and Wager (2021). In the documentation, treat this as a consequence of the weighting scheme, not as a package-specific redefinition of calibration.

Short recommendation

Default package recommendation: start with dr when original-population targeting is the goal and overlap looks adequate. Move to r when the package’s overlap screen flags weak overlap and an overlap-weighted target is acceptable.

Choosing a calibration method

`isotonic`

Best default when you want a monotone nonparametric calibration map with minimal assumptions.

In causalCalibration, isotonic is implemented with a single monotone LightGBM regression tree. That gives you:

monotone weighted fitting,
a practical min_child_samples control,
and flat extrapolation beyond the observed score range.

`monotone_spline`

Use when you want a smooth monotone map rather than a piecewise-constant one.

This method uses a monotone spline fit with nonnegative derivative coefficients and a smoothness penalty.

`linear`

Use when you want a simple parametric recalibration step and easy interpretation.

`histogram`

Use when you want a coarse, transparent piecewise-constant map or a simple diagnostic baseline.

Choosing standard calibration vs cross-calibration

Prefer standard calibration when

you already have a clean calibration sample,
or you only have one prediction per unit.

Prefer cross-calibration when

your learner was trained with cross-fitting,
you have pooled OOF predictions and fold-specific predictions,
and you want to fit and calibrate in sample without a separate holdout calibration split.

Diagnostics and overlap

If overlap is weak, diagnostics should be interpreted together with the loss choice:

a dr diagnostic targets the original population but may be noisier because of inverse-propensity behavior,
an r fit may be more stable in overlap-poor regions, but its calibrated scores answer a different target-population question,
assess_overlap() gives a quick severity label plus a default loss recommendation before you fit anything.

The assess_overlap() severity labels use package-default screening rules based on propensity tails, clipping, and IPW effective sample size. Treat them as workflow defaults rather than universal cutoffs.