This guide is for researchers who want to prototype a new
hte3 learner and for contributors who want to add a new
learner to the package itself.
The key design idea is that hte3 separates:
- data and nuisance estimates, stored in an
hte3_Task - meta-learner logic, implemented as
Lrnr_*classes - cross-validation and selection, handled by
cross_validate_*()or the high-level wrappers
Framework Overview
Most custom learners in hte3 inherit from the internal
base class hte3:::Lrnr_hte. That class already handles:
- training a base
sl3learner on a pseudo-outcome task - prediction on new modifier data
- passing through observation weights
- basic treatment-type validation
Your learner supplies the parts that are specific to the estimand and the learning algorithm.
The minimal ingredients are:
-
initialize(): declare the learner parameters and pass the base learner toLrnr_hte. -
get_pseudo_data(): compute the pseudo-outcome and, optionally, pseudo-weights from anhte3_Task. - Treatment compatibility: set the private
.treatment_typefield so the learner rejects unsupported task types. - Pseudo-outcome type and family: choose the right regression target for the downstream base learner.
The Data Flow
At fit time, the typical flow is:
- Build an
hte3_Taskwith modifiersV, confoundersW, treatmentA, outcomeY, and nuisance estimates. - Inside
get_pseudo_data(), read the nuisance quantities you need from the task. - Construct a pseudo-outcome and optional pseudo-weights.
- Let
Lrnr_htebuild the derivedsl3_Taskand train the suppliedbase_learner. - Predict on the modifier set
Vthrough the inherited prediction path.
When you are adding a learner to the package, try to keep the pseudo-data step as simple and testable as possible. Small helper functions are easier to test than one large learner method.
Minimal Contract
Use this checklist when you create a new learner:
- The learner inherits from
hte3:::Lrnr_hte. -
initialize()stores every tuning knob inparams. -
get_pseudo_data()returns a list withpseudo_outcomeand, optionally,pseudo_weights. - The returned pseudo-outcome matches the declared
pseudo_outcome_type/pseudo_family. - The learner sets an appropriate
.treatment_type. - Predictions only depend on the modifier variables that define the target.
Research Prototype Skeleton
The example below is intentionally minimal and is shown with
eval = FALSE. It is meant as a template for a research
prototype, not as a recommended final estimator.
library(R6)
library(hte3)
library(sl3)
Lrnr_cate_prototype <- R6Class(
classname = "Lrnr_cate_prototype",
inherit = hte3:::Lrnr_hte,
portable = TRUE,
class = TRUE,
public = list(
initialize = function(base_learner, treatment_level = NULL, control_level = NULL, ...) {
params <- list(
base_learner = base_learner,
treatment_level = treatment_level,
control_level = control_level,
...
)
super$initialize(
params = params,
base_learner = base_learner,
pseudo_outcome_type = "continuous",
pseudo_family = stats::gaussian(),
...
)
},
get_pseudo_data = function(hte3_task, treatment_level = NULL, control_level = NULL, ...) {
pseudo_outcome <- hte3:::compute_cate_dr_pseudo_outcome(
hte3_task,
treatment_level = treatment_level,
control_level = control_level
)
list(pseudo_outcome = pseudo_outcome)
}
),
private = list(
.treatment_type = c("binomial", "categorical"),
.properties = c("cate", "prototype")
)
)That is the minimum viable pattern:
- parameters live in
self$params - the nuisance-aware pseudo-outcome is computed inside
get_pseudo_data() - the base class takes care of task construction and prediction
Choosing the Right Family
Use the regression family that matches the pseudo-outcome you are fitting:
- Gaussian pseudo-outcomes: CATE-style least-squares meta-learners such as DR, R, and EP.
- Quasibinomial pseudo-outcomes with weights: bounded ratio-style targets such as the current CRR EP learner.
- Binomial pseudo-outcomes: only when the pseudo-outcome itself is a genuine Bernoulli-style target.
If you are unsure, inspect the existing Lrnr_cate_DR,
Lrnr_cate_EP, and Lrnr_crr_EP implementations
and match the family to the pseudo-outcome scale, not to the original
observed outcome scale.
When to Expose a New Learner in the Package
Before adding a research prototype to the package API, lock down these decisions:
- What estimand does it target?
- Which treatment types does it support?
- Does it require contrast levels?
- What nuisance quantities does it assume are available?
- How should users tune it?
- Does it need its own selector loss for outer cross-validation?
If any of those are still unstable, keep the learner as a research-only prototype first.
Promotion Checklist
When a learner is ready to become part of hte3, do all
of the following:
- Add the learner class and roxygen docs under
R/. - Add tests for fit/predict behavior, treatment validation, and any special numerical guards.
- Add at least one end-to-end wrapper or cross-validation test if the learner participates in a public workflow.
- Document the learner in the relevant workflow vignette and on the website.
- Decide whether it belongs in the high-level wrapper API or only in the advanced interface.
- Run the release checks described in the contribution guide.
Related Guides
- The
advanced-sl3vignette shows how low-level learner portfolios plug intocross_validate_cate()andcross_validate_crr(). - The
ep-learnervignette shows how a more complex learner uses theSieve-based basis construction insidehte3. - The
contributingvignette describes the repo workflow, site sync commands, and release checks for package contributors.