publications | Lars van der Laan

Below are selected methodological and theoretical contributions. For a complete list of publications, including applied work, see my Google Scholar profile.

Debiased machine learning and efficiency theory

2025

Automatic Debiased Machine Learning for Smooth Functionals of Nonparametric M-Estimands

Lars van der Laan, Aurelien Bibaut, Nathan Kallus, and 1 more author

2025

Abs PDF

We propose a unified framework for automatic debiased machine learning (autoDML) to perform inference on smooth functionals of infinite-dimensional M-estimands, defined as population risk minimizers over Hilbert spaces. By automating debiased estimation and inference procedures in causal inference and semiparametric statistics, our framework enables practitioners to construct valid estimators for complex parameters without requiring specialized expertise. The framework supports Neyman-orthogonal loss functions with unknown nuisance parameters requiring data-driven estimation, as well as vector-valued M-estimands involving simultaneous loss minimization across multiple Hilbert space models. We formalize the class of parameters efficiently estimable by autoDML as a novel class of nonparametric projection parameters, defined via orthogonal minimum loss objectives. We introduce three autoDML estimators based on one-step estimation, targeted minimum loss-based estimation, and the method of sieves. For data-driven model selection, we derive a novel decomposition of model approximation error for smooth functionals of M-estimands and propose adaptive debiased machine learning estimators that are superefficient and adaptive to the functional form of the M-estimand. Finally, we illustrate the flexibility of our framework by constructing autoDML estimators for the long-term survival under a beta-geometric model.

2024

Doubly robust inference via calibration

Lars van der Laan, Alex Luedtke, and Marco Carone

arXiv preprint arXiv:2411.02771, 2024

Abs PDF Code Poster

Doubly robust estimators are widely used for estimating average treatment effects and other linear summaries of regression functions. While consistency requires only one of two nuisance functions to be estimated consistently, asymptotic normality typically require sufficiently fast convergence of both. In this work, we correct this mismatch: we show that calibrating the nuisance estimators within a doubly robust procedure yields doubly robust asymptotic normality for linear functionals. We introduce a general framework, calibrated debiased machine learning (calibrated DML), and propose a specific estimator that augments standard DML with a simple isotonic regression adjustment. Our theoretical analysis shows that the calibrated DML estimator remains asymptotically normal if either the regression or the Riesz representer of the functional is estimated sufficiently well, allowing the other to converge arbitrarily slowly or even inconsistently. We further propose a simple bootstrap method for constructing confidence intervals, enabling doubly robust inference without additional nuisance estimation. In a range of semi-synthetic benchmark datasets, calibrated DML reduces bias and improves coverage relative to standard DML. Our method can be integrated into existing DML pipelines by adding just a few lines of code to calibrate cross-fitted estimates via isotonic regression.

2023

Adaptive debiased machine learning using data-driven model selection techniques

Lars van der Laan, Marco Carone, Alex Luedtke, and 1 more author

arXiv preprint arXiv:2307.12544, 2023

Abs PDF Twitter Code

Debiased machine learning estimators for nonparametric inference of smooth functionals of the data-generating distribution can suffer from excessive variability and instability. For this reason, practitioners may resort to simpler models based on parametric or semiparametric assumptions. However, such simplifying assumptions may fail to hold, and estimates may then be biased due to model misspecification. To address this problem, we propose Adaptive Debiased Machine Learning (ADML), a nonparametric framework that combines data-driven model selection and debiased machine learning techniques to construct asymptotically linear, adaptive, and superefficient estimators for pathwise differentiable functionals. By learning model structure directly from data, ADML avoids the bias introduced by model misspecification and remains free from the restrictions of parametric and semiparametric models. While they may exhibit irregular behavior for the target parameter in a nonparametric statistical model, we demonstrate that ADML estimators provides regular and locally uniformly valid inference for a projection-based oracle parameter. Importantly, this oracle parameter agrees with the original target parameter for distributions within an unknown but correctly specified oracle statistical submodel that is learned from the data. This finding implies that there is no penalty, in a local asymptotic sense, for conducting data-driven model selection compared to having prior knowledge of the oracle submodel and oracle parameter. To demonstrate the practical applicability of our theory, we provide a broad class of ADML estimators for estimating the average treatment effect in adaptive partially linear regression models.

2021

Higher order targeted maximum likelihood estimation

Mark van der Laan, Zeyi Wang, and Lars van der Laan

arXiv preprint arXiv:2101.06290, 2021

PDF

Dynamic decision-making and heterogeneous treatment effects

2025

Efficient Inference for Inverse Reinforcement Learning and Dynamic Discrete Choice Models

Lars van der Laan, Aurelien Bibaut, and Nathan Kallus

arXiv preprint arXiv:2512.24407, 2025

Abs PDF

Inverse reinforcement learning (IRL) and dynamic discrete choice (DDC) models explain sequential decision-making by recovering reward functions that rationalize observed behaviour. Flexible IRL methods typically rely on machine learning but provide no guarantees for valid inference, while classical DDC approaches impose restrictive parametric specifications and often require repeated dynamic programming. We develop a semiparametric framework for debiased inverse reinforcement learning that yields statistically efficient inference for a broad class of reward-dependent functionals in maximum-entropy IRL and Gumbel-shock DDC models. We show that the log–behaviour policy acts as a pseudo-reward that point-identifies policy value differences and, under a simple normalization, the reward itself. We then formalize these tar- gets—including policy values under known and counterfactual softmax policies and functionals of the normalized reward—as smooth functionals of the behaviour policy and transition ker- nel, establish pathwise differentiability, and derive their efficient influence functions. Building on this characterization, we construct automatic debiased machine-learning estimators that al- low flexible nonparametric estimation of nuisance components while achieving √n-consistency, asymptotic normality, and semiparametric efficiency. Our framework extends classical inference for DDC models to nonparametric rewards and modern machine-learning tools, providing a unified and computationally tractable approach to statistical inference in IRL.
Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration

Lars van der Laan, and Nathan Kallus

arXiv preprint arXiv:2512.23927, 2025

Abs PDF

Fitted Q-iteration (FQI) and its entropy-regularized variant, soft FQI, are central tools for value-based model-free offline reinforcement learning, but can behave poorly under function approximation and distribution shift. In the entropy-regularized setting, we show that the soft Bellman operator is locally contractive in the stationary norm of the soft-optimal policy, rather than in the behavior norm used by standard FQI. This geometric mismatch explains the instability of soft Q-iteration with function approximation in the absence of Bellman completeness. To restore contraction, we introduce stationary-reweighted soft FQI, which reweights each regression update using the stationary distribution of the current policy. We prove local linear convergence under function approximation with geometrically damped weight-estimation errors, assuming approximate realizability. Our analysis further suggests that global convergence may be recovered by gradually reducing the softmax temperature, and that this continuation approach can extend to the hardmax limit under a mild margin condition.
Fitted Q Evaluation without Bellman Completeness via Stationary Weighting

Lars van der Laan, and Nathan Kallus

arXiv preprint arXiv:2512.23805, 2025

Abs PDF

Fitted Q-evaluation (FQE) is a central method for off-policy evaluation in reinforcement learning, but it generally requires Bellman completeness: that the hypothesis class is closed under the evaluation Bellman operator. This requirement is challenging because enlarging the hypothesis class can worsen completeness. We show that the need for this assumption stems from a fundamental norm mismatch: the Bellman operator is gamma-contractive under the stationary distribution of the target policy, whereas FQE minimizes Bellman error under the behavior distribution. We propose a simple fix: reweight each regression step using an estimate of the stationary density ratio, thereby aligning FQE with the norm in which the Bellman operator contracts. This enables strong evaluation guarantees in the absence of realizability or Bellman completeness, avoiding the geometric error blow-up of standard FQE in this setting while maintaining the practicality of regression-based evaluation.
Bellman Calibration for V-Learning in Offline Reinforcement Learning

Lars van der Laan, and Nathan Kallus

arXiv preprint arXiv:2512.23694, 2025

Abs PDF

We introduce Iterated Bellman Calibration, a simple, model-agnostic, post-hoc procedure for calibrating off-policy value predictions in infinite-horizon Markov decision processes. Bellman calibration requires that states with similar predicted long-term returns exhibit one-step returns consistent with the Bellman equation under the target policy. We adapt classical histogram and isotonic calibration to the dynamic, counterfactual setting by repeatedly regressing fitted Bellman targets onto a model’s predictions, using a doubly robust pseudo-outcome to handle off-policy data. This yields a one-dimensional fitted value iteration scheme that can be applied to any value estimator. Our analysis provides finite-sample guarantees for both calibration and prediction under weak assumptions, and critically, without requiring Bellman completeness or realizability.
Semiparametric Double Reinforcement Learning with Applications to Long-Term Causal Inference

Lars van der Laan, David Hubbard, Allen Tran, and 2 more authors

2025

Abs PDF Poster

Long-term causal effects often must be estimated from short-term data due to limited follow-up in healthcare, economics, and online platforms. Markov Decision Processes (MDPs) provide a natural framework for capturing such long-term dynamics through sequences of states, actions, and rewards. Double Reinforcement Learning (DRL) enables efficient inference on policy values in MDPs, but nonparametric implementations require strong intertemporal overlap assumptions and often exhibit high variance and instability. We propose a semiparametric extension of DRL for efficient inference on linear functionals of the Q-function–such as policy values–in infinite-horizon, time-homogeneous MDPs. By imposing structural restrictions on the Q-function, our approach relaxes the strong overlap conditions required by nonparametric methods and improves statistical efficiency. Under model misspecification, our estimators target the functional of the best-approximating Q-function, with only second-order bias. We provide conditions for valid inference using sieve methods and data-driven model selection. A central challenge in DRL is the estimation of nuisance functions, such as density ratios, which often involve difficult minimax optimization. To address this, we introduce a novel plug-in estimator based on isotonic Bellman calibration, which combines fitted Q-iteration with an isotonic regression adjustment. The estimator is debiased without requiring estimation of additional nuisance functions and reduces high-dimensional overlap assumptions to a one-dimensional condition. Bellman calibration extends isotonic calibration–widely used in prediction and classification–to the MDP setting and may be of independent interest.
Inverse Reinforcement Learning Using Just Classification and a Few Regressions

Lars van der Laan, Nathan Kallus, and Aurélien Bibaut

arXiv preprint arXiv:2509.21172, 2025

Abs PDF

Inverse reinforcement learning (IRL) aims to explain observed behavior by uncovering an underlying reward. In the maximum-entropy or Gumbel-shocks-to-reward frameworks, this amounts to fitting a reward function and a soft value function that together satisfy the soft Bellman consistency condition and maximize the likelihood of observed actions. While this perspective has had enormous impact in imitation learning for robotics and understanding dynamic choices in economics, practical learning algorithms often involve delicate inner-loop optimization, repeated dynamic programming, or adversarial training, all of which complicate the use of modern, highly expressive function approximators like neural nets and boosting. We revisit softmax IRL and show that the population maximum-likelihood solution is characterized by a linear fixed-point equation involving the behavior policy. This observation reduces IRL to two off-the-shelf supervised learning problems: probabilistic classification to estimate the behavior policy, and iterative regression to solve the fixed point. The resulting method is simple and modular across function approximation classes and algorithms. We provide a precise characterization of the optimal solution, a generic oracle-based algorithm, finite-sample error bounds, and empirical results showing competitive or superior performance to MaxEnt IRL.
Hybrid Meta-learners for Estimating Heterogeneous Treatment Effects

Zhongyuan Liang, Lars van der Laan, and Ahmed Alaa

arXiv preprint arXiv:2506.13680, 2025

2024

Combining T-learning and DR-learning: a framework for oracle-efficient estimation of causal contrasts

Lars van der Laan, Marco Carone, and Alex Luedtke

arXiv preprint arXiv:2402.01972, 2024

Abs PDF Code Poster

We introduce efficient plug-in (EP) learning, a novel framework for the estimation of heterogeneous causal contrasts, such as the conditional average treatment effect and conditional relative risk. The EP-learning framework enjoys the same oracle-efficiency as Neyman-orthogonal learning strategies, such as DR-learning and R-learning, while addressing some of their primary drawbacks, including that (i) their practical applicability can be hindered by loss function non-convexity; and (ii) they may suffer from poor performance and instability due to inverse probability weighting and pseudo-outcomes that violate bounds. To avoid these drawbacks, EP-learner constructs an efficient plug-in estimator of the population risk function for the causal contrast, thereby inheriting the stability and robustness properties of plug-in estimation strategies like T-learning. Under reasonable conditions, EP-learners based on empirical risk minimization are oracle-efficient, exhibiting asymptotic equivalence to the minimizer of an oracle-efficient one-step debiased estimator of the population risk function. In simulation experiments, we illustrate that EP-learners of the conditional average treatment effect and conditional relative risk outperform state-of-the-art competitors, including T-learner, R-learner, and DR-learner. Open-source implementations of the proposed methods are available in our R package hte3.

2023

Causal isotonic calibration for heterogeneous treatment effects

Lars van der Laan, Ernesto Ulloa-Pérez, Marco Carone, and 1 more author

In Proceedings of the 40th International Conference on Machine Learning (ICML), 2023

Abs Blog PDF Twitter Code Poster

We propose causal isotonic calibration, a novel nonparametric method for calibrating predictors of heterogeneous treatment effects. Furthermore, we introduce cross-calibration, a data-efficient variant of calibration that eliminates the need for hold-out calibration sets. Cross-calibration leverages cross-fitted predictors and generates a single calibrated predictor using all available data. Under weak conditions that do not assume monotonicity, we establish that both causal isotonic calibration and cross-calibration achieve fast doubly-robust calibration rates, as long as either the propensity score or outcome regression is estimated accurately in a suitable sense. The proposed causal isotonic calibrator can be wrapped around any black-box learning algorithm, providing robust and distribution-free calibration guarantees while preserving predictive performance.

Causal inference methodology

2025

Nonparametric Instrumental Variable Inference with Many Weak Instruments

Lars van der Laan, Nathan Kallus, and Aurélien Bibaut

2025

Abs PDF

We study inference on linear functionals in the nonparametric instrumental variable (NPIV) problem with a discretely-valued instrument under a many-weak-instruments asymptotic regime, where the number of instrument values grows with the sample size. A key motivating example is estimating long-term causal effects in a new experiment with only short-term outcomes, using past experiments to instrument for the effect of short- on long-term outcomes. Here, the assignment to a past experiment serves as the instrument: we have many past experiments but only a limited number of units in each. Since the structural function is nonparametric but constrained by only finitely many moment restrictions, point identification typically fails. To address this, we consider linear functionals of the minimum-norm solution to the moment restrictions, which is always well-defined. As the number of instrument levels grows, these functionals define an approximating sequence to a target functional, replacing point identification with a weaker asymptotic notion suited to discrete instruments. Extending the Jackknife Instrumental Variable Estimator (JIVE) beyond the classical parametric setting, we propose npJIVE, a nonparametric estimator for solutions to linear inverse problems with many weak instruments. We construct automatic debiased machine learning estimators for linear functionals of both the structural function and its minimum-norm projection, and establish their efficiency in the many-weak-instruments regime.
Targeted maximum likelihood based estimation for longitudinal mediation analysis

Zeyi Wang, Lars van der Laan, Maya Petersen, and 3 more authors

Journal of Causal Inference, 2025

PDF
Undersmoothed LASSO Models for Propensity Score Weighting and Synthetic Negative Control Exposures for Bias Detection

Richard Wyss, Ben B Hansen, Georg Hahn, and 2 more authors

arXiv preprint arXiv:2506.17760, 2025

2024

Adaptive-TMLE for the Average Treatment Effect based on Randomized Controlled Trial Augmented with Real-World Data

Mark van der Laan, Sky Qiu, and Lars van der Laan

arXiv preprint arXiv:2405.07186, 2024

Abs PDF

We consider the problem of estimating the average treatment effect (ATE) when both randomized control trial (RCT) data and real-world data (RWD) are available. We decompose the ATE estimand as the difference between a pooled-ATE estimand that integrates RCT and RWD and a bias estimand that captures the conditional effect of RCT enrollment on the outcome. We introduce an adaptive targeted minimum loss-based estimation (A-TMLE) framework to estimate them. We prove that the A-TMLE estimator is root-n-consistent and asymptotically normal. Moreover, in finite sample, it achieves the super-efficiency one would obtain had one known the oracle model for the conditional effect of the RCT enrollment on the outcome. Consequently, the smaller the working model of the bias induced by the RWD is, the greater our estimator’s efficiency, while our estimator will always be at least as efficient as an efficient estimator that uses the RCT data only. A-TMLE outperforms existing methods in simulations by having smaller mean-squared-error and 95% confidence intervals. A-TMLE could help utilize RWD to improve the efficiency of randomized trial results without biasing the estimates of intervention effects. This approach could allow for smaller, faster trials, decreasing the time until patients can receive effective treatments.

2023

Semiparametric logistic regression for inference on relative vaccine efficacy in case-only studies with informative missingness

Lars van der Laan, and Peter B Gilbert

arXiv preprint arXiv:2303.11462, 2023

Abs PDF

We develop semiparametric methods for estimating subgroup-specific relative vaccine efficacy against multiple viral strains in a partially vaccinated population. Focusing on observational case-only studies, we address informative missingness in strain type due to vaccination status, pre-vaccination character- istics, and post-infection factors such as viral load. We establish general conditions for the nonpara- metric identification of relative conditional vaccine efficacy between strains using covariate-adjusted conditional odds ratio parameters. Assuming a log-linear parametric form for strain-specific conditional vaccine efficacy, we propose targeted maximum likelihood estimators based on partially linear logis- tic regression, leveraging machine learning for flexible confounding adjustment. Finally, we apply our methods to estimate relative strain-specific conditional vaccine efficacy in the ENSEMBLE COVID-19 vaccine trial.

2022

Nonparametric estimation of the causal effect of a stochastic threshold-based intervention

Lars van der Laan, Wenbo Zhang, and Peter B Gilbert

Biometrics, 2022

Abs PDF

Identifying a biomarker or treatment-dose threshold that marks a specified level of risk is an important problem, especially in clinical trials. In view of this goal, we consider a covariate-adjusted threshold-based interventional estimand, which happens to equal the binary treatment–specific mean estimand from the causal inference literature obtained by dichotomizing the continuous biomarker or treatment as above or below a threshold. The unadjusted version of this estimand was considered in Donovan et al.. Expanding upon Stitelman et al., we show that this estimand, under conditions, identifies the expected outcome of a stochastic intervention that sets the treatment dose of all participants above the threshold. We propose a novel nonparametric efficient estimator for the covariate-adjusted threshold-response function for the case of informative outcome missingness, which utilizes machine learning and targeted minimum-loss estimation (TMLE). We prove the estimator is efficient and characterize its asymptotic distribution and robustness properties. Construction of simultaneous 95% confidence bands for the threshold-specific estimand across a set of thresholds is discussed. In the Supporting Information, we discuss how to adjust our estimator when the biomarker is missing at random, as occurs in clinical trials with biased sampling designs, using inverse probability weighting. Efficiency and bias reduction of the proposed estimator are assessed in simulations. The methods are employed to estimate neutralizing antibody thresholds for virologically confirmed dengue risk in the CYD14 and CYD15 dengue vaccine trials.

Conformal prediction and predictive uncertainty

2025

Generalized Venn and Venn-Abers Calibration with Applications in Conformal Prediction

Lars van der Laan, and Ahmed Alaa

In Proceedings of the 42nd International Conference on Machine Learning (ICML), 2025

To appear

Abs PDF

Ensuring model calibration is critical for reliable predictions, yet popular distribution-free methods, such as histogram binning and isotonic regression, provide only asymptotic guarantees. We introduce a unified framework for Venn and Venn-Abers calibration, generalizing Vovk’s binary classification approach to arbitrary prediction tasks and loss functions. Venn calibration leverages binning calibrators to construct prediction sets that contain at least one marginally perfectly calibrated point prediction in finite samples, capturing epistemic uncertainty in the calibration process. The width of these sets shrinks asymptotically to zero, converging to a conditionally calibrated point prediction. Furthermore, we propose Venn multicalibration, a novel methodology for finite-sample calibration across subpopulations. For quantile loss, group-conditional and multicalibrated conformal prediction arise as special cases of Venn multicalibration, and Venn calibration produces novel conformal prediction intervals that achieve quantile-conditional coverage. As a separate contribution, we extend distribution-free conditional calibration guarantees of histogram binning and isotonic calibration to general losses.

2024

Self-Calibrating Conformal Prediction

Lars van der Laan, and Ahmed M. Alaa

The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

Abs PDF Code Poster

In machine learning, model calibration and predictive inference are essential for producing reliable predictions and quantifying uncertainty to support decision- making. Recognizing the complementary roles of point and interval predictions, we introduce Self-Calibrating Conformal Prediction, a method that combines Venn- Abers calibration and conformal prediction to deliver calibrated point predictions alongside prediction intervals with finite-sample validity conditional on these pre- dictions. To achieve this, we extend the original Venn-Abers procedure from binary classification to regression. Our theoretical framework supports analyzing confor- mal prediction methods that involve calibrating model predictions and subsequently constructing conditionally valid prediction intervals on the same data, where the conditioning set or conformity scores may depend on the calibrated predictions. Real-data experiments show that our method improves interval efficiency through model calibration and offers a practical alternative to feature-conditional validity.

2023

Estimating Uncertainty in Multimodal Foundation Models using Public Internet Data

Shiladitya Dutta, Hongbo Wei, Lars van der Laan, and 1 more author

arXiv preprint arXiv:2310.09926, 2023

Abs PDF

Foundation models are trained on vast amounts of data at scale using self-supervised learning, enabling adaptation to a wide range of downstream tasks. At test time, these models exhibit zero-shot capabilities through which they can classify previously unseen (user-specified) categories. In this paper, we address the problem of quantifying uncertainty in these zero-shot predictions. We propose a heuristic approach for uncertainty estimation in zero-shot settings using conformal prediction with web data. Given a set of classes at test time, we conduct zero-shot classification with CLIP-style models using a prompt template, e.g., "an image of a ", and use the same template as a search query to source calibration data from the open web. Given a web-based calibration set, we apply conformal prediction with a novel conformity score that accounts for potential errors in retrieved web data. We evaluate the utility of our proposed method in Biomedical foundation models; our preliminary results show that web-based conformal prediction sets achieve the target coverage with satisfactory efficiency on a variety of biomedical datasets.

Software and computational tools

2022

hal9001: The scalable highly adaptive lasso

Jeremy R Coyle, Nima S Hejazi, Rachael V Phillips, and 2 more authors

2022

R package version 0.4.2

PDF