Choosing Losses and Methods
What population is being calibrated?
loss="dr"
loss="dr" calibrates to the original study population.
However, the DR-loss relies on inverse-propensity weighting. That makes it more sensitive when overlap is weak or propensities get close to 0 or 1.
- target: original / observed population
- strength: direct population-level interpretation
- caveat: inverse-propensity weighting can become unstable under poor overlap
This aligns with the doubly robust treatment-effect construction used in the causal calibration paper and DR-learner literature, including van der Laan et al. (2023) and Kennedy (2020).
loss="r"
loss="r" calibrates using the R-loss weighting induced by (A - e(W))^2. That means the calibration problem places more emphasis on observations in the overlap or equipoise region.
This is often more robust when overlap is weak, but it changes the target: the calibrated scores apply to an overlap-weighted population rather than the original study population.
- target: overlap-weighted population
- strength: often more robust under weak overlap
- caveat: the target population is no longer the original observed population
This interpretation follows the R-learner weighting structure of Nie and Wager (2021). In the documentation, treat this as a consequence of the weighting scheme, not as a package-specific redefinition of calibration.
Short recommendation
Default package recommendation: start with
drwhen original-population targeting is the goal and overlap looks adequate. Move torwhen the package’s overlap screen flags weak overlap and an overlap-weighted target is acceptable.
Choosing a calibration method
isotonic
Best default when you want a monotone nonparametric calibration map with minimal assumptions.
In causalCalibration, isotonic is implemented with a single monotone LightGBM regression tree. That gives you:
- monotone weighted fitting,
- a practical
min_child_samplescontrol, - and flat extrapolation beyond the observed score range.
monotone_spline
Use when you want a smooth monotone map rather than a piecewise-constant one.
This method uses a monotone spline fit with nonnegative derivative coefficients and a smoothness penalty.
linear
Use when you want a simple parametric recalibration step and easy interpretation.
histogram
Use when you want a coarse, transparent piecewise-constant map or a simple diagnostic baseline.
Choosing standard calibration vs cross-calibration
Prefer standard calibration when
- you already have a clean calibration sample,
- or you only have one prediction per unit.
Prefer cross-calibration when
- your learner was trained with cross-fitting,
- you have pooled OOF predictions and fold-specific predictions,
- and you want to fit and calibrate in sample without a separate holdout calibration split.
Diagnostics and overlap
If overlap is weak, diagnostics should be interpreted together with the loss choice:
- a
drdiagnostic targets the original population but may be noisier because of inverse-propensity behavior, - an
rfit may be more stable in overlap-poor regions, but its calibrated scores answer a different target-population question, assess_overlap()gives a quick severity label plus a default loss recommendation before you fit anything.
The assess_overlap() severity labels use package-default screening rules based on propensity tails, clipping, and IPW effective sample size. Treat them as workflow defaults rather than universal cutoffs.