sl3 Guide

How hte3 uses the sl3 learner framework

hte3 package badge showing a simple regression tree with subgroup treatment contrasts

sl3 is the learner framework underneath hte3. It supplies the nuisance models, learner libraries, and cross-validation tools used by the package. This page explains the parts of sl3 that matter when you want to understand or customize those modeling components.

Start with the R guide for the main wrappers and their arguments. Use this page for learner customization and lower-level control settings.

  • Which learners you pass where
  • What get_autoML() already does
  • How Stack, Lrnr_sl, and CV fit together

Learner Slots

Where do learners plug into hte3?

propensity_learner

Estimates treatment assignment. This matters especially for DR, R, EP, and IPW style workflows.

outcome_learner

Estimates the outcome regression. This is often one of the main quality drivers for CATE and CRR performance.

mean_learner

Provides the marginal mean nuisance when the workflow requires it. In many analyses, the package default is sufficient.

base_learner

Used inside the chosen meta-learner to learn the final heterogeneity surface after the pseudo-outcome step.

Default Stack

What does get_autoML() use?

Stack$new(
  Lrnr_glmnet$new(),
  Lrnr_gam$new(),
  # Added when optional runtime packages are installed:
  Lrnr_earth$new(degree = 2),
  Lrnr_ranger$new(max.depth = 10),
  Lrnr_xgboost_early_stopping$new(min_child_weight = 15, max_depth = 2, eta = 0.2, subsample = 0.8, colsample_bytree = 0.8),
  Lrnr_xgboost_early_stopping$new(min_child_weight = 15, max_depth = 3, eta = 0.15, subsample = 0.9),
  Lrnr_xgboost_early_stopping$new(min_child_weight = 15, max_depth = 4, eta = 0.15, subsample = 0.9),
  Lrnr_xgboost_early_stopping$new(min_child_weight = 15, max_depth = 5, eta = 0.15, subsample = 0.9),
  Lrnr_xgboost_early_stopping$new(min_child_weight = 15, max_depth = 4, eta = 0.08, subsample = 0.8, colsample_bytree = 0.8),
)

Why this is useful

It mixes linear, additive, spline, tree, and boosting-style learners, which gives you a reasonable first-pass library without designing one from scratch.

When to override it

Override the default when you need shorter iteration cycles, stronger interpretability, or a task-specific learner library.

One caveat

Some sl3 learners depend on supporting packages being available in your R environment, so actual availability can vary by setup.

Stacking

How Stack and Lrnr_sl work in practice

In sl3, a Stack is just a library of candidate learners. A Super Learner is what you get when you wrap that library in Lrnr_sl with a metalearner that combines or selects among the candidates.

Step 1

Build a learner library

Use Stack$new(...) to enumerate the models you want to compare. This does not fit an ensemble by itself. It just defines the candidate set.

library(sl3)

learner_library <- Stack$new(
  Lrnr_glm_fast$new(),
  Lrnr_ranger$new(),
  Lrnr_xgboost$new()
)
Step 2

Wrap it in a Super Learner

Use Lrnr_sl$new(...) when you want cross-validated selection or weighted combination across that library.

sl_fit <- Lrnr_sl$new(
  learners = learner_library,
  metalearner = Lrnr_nnls$new(),
  cv_control = list(V = 5)
)$train(task)

What Stack means in hte3

You can pass a stacked library anywhere hte3 expects a learner, such as propensity_learner, outcome_learner, or base_learner.

What Lrnr_sl adds

Lrnr_sl adds outer learner selection or weighted combination on top of a library, using a metalearner and fold structure defined through cv_control.

Common hte3 default

get_autoML() returns a Stack built from its available safe default learners, which gives you a ready-made candidate library without manually listing learners.

Practical rule: use Stack when you want to define the candidates, and use Lrnr_sl when you want sl3 to cross-validate and combine them.

Cross-Validation

There are three different CV layers to keep straight

Users often say “cross-validation” when they mean different parts of the pipeline. In hte3, it helps to separate nuisance cross-fitting, sl3 learner-library CV, and outer HTE learner selection.

1. Nuisance cross-fitting

Controlled by cross_fit = TRUE in hte_task() or cross_fit_and_cv = TRUE in the low-level task builder. This is about estimating nuisance functions more robustly, not selecting among HTE methods.

task <- hte_task(
  data = df,
  modifiers = mods,
  confounders = confs,
  treatment = "A",
  outcome = "Y",
  cross_fit = TRUE
)

2. sl3 library CV

Controlled inside sl3 with Lrnr_sl and cv_control = list(V = ...). This is where a Super Learner compares members of a Stack.

sl_fit <- Lrnr_sl$new(
  learners = learner_library,
  metalearner = Lrnr_nnls$new(),
  cv_control = list(V = 5)
)$train(task)

3. Outer HTE learner CV

Controlled by cross_validate = TRUE in fit_cate() or fit_crr(), or explicitly with cross_validate_cate() and cross_validate_crr(). This is where you compare DR, R, T, EP, or CRR families.

fit <- fit_cate(
  task,
  method = c("dr", "r", "ep"),
  base_learner = learner_library,
  cross_validate = TRUE,
  cv_control = list(V = 5)
)
Recommended mental model: cross_fit is for nuisance estimation, Lrnr_sl is for sl3 Super Learner selection inside a learner library, and cross_validate = TRUE in the wrappers is for selection among HTE learner families.

Starter Recipes

Common learner strategies for practitioners

Fast baseline

Simple generalized linear learners

Use learners like Lrnr_glm_fast or Lrnr_mean for rapid iteration, debugging, or an interpretable baseline.

Balanced default

Keep get_autoML()

Use this when a broad learner library is needed without constructing one manually.

More nonlinear signal

Tree and boosting heavy

Favor learners such as Lrnr_ranger and Lrnr_xgboost when interactions and nonlinearities are likely to matter.

Custom library

Hand-built Stack or Lrnr_sl

Use this path when a preferred learner library already exists or when comparing a curated set of candidate learners.

Code Patterns

Small patterns you can copy into real analyses

Fast and explicit

hte_task(
  data = df,
  modifiers = mods,
  confounders = confs,
  treatment = "A",
  outcome = "Y",
  propensity_learner = Lrnr_glm_fast$new(),
  outcome_learner = Lrnr_glm_fast$new(),
  mean_learner = Lrnr_mean$new()
)

Custom base learner for CATE

fit_cate(
  task,
  method = "dr",
  base_learner = Lrnr_ranger$new(),
  cross_validate = FALSE
)
Continuous-treatment note: in the current package, the continuous-treatment CATE path is the R-learner. That implementation uses the partially linear effect-model view with an A * tau(X) term rather than a fully general treatment-response surface.
Reduced-modifier-set note: if the target modifiers are V and the nuisance adjustment set is W with V a strict subset of W, then the target of interest is E[Y(1)-Y(0) | V] = E[tau(W) | V]. In the supported binary/categorical-treatment setting, DR- and EP-learners target that surface. The current R-learner instead targets the overlap-weighted projection f_R(V) = E[Var(A|W) tau(W) | V] / E[Var(A|W) | V], which becomes E[e(W)(1-e(W)) tau(W) | V] / E[e(W)(1-e(W)) | V] for binary treatment.

Official sl3 Resources

References for further detail

These are the upstream references most likely to help an hte3 user make better modeling decisions.

sl3 package site

The main package homepage with reference docs and articles.

Intro to sl3

A practical article for understanding tasks, learners, training, and prediction.

tlverse handbook chapter

A longer-form chapter that is often easier to learn from than jumping straight into reference pages.