publications
For a complete list of my research publications, check out my Google Scholar profile.
2024
- Adaptive-TMLE for the Average Treatment Effect based on Randomized Controlled Trial Augmented with Real-World DataMark Laan, Sky Qiu, and Lars LaanarXiv preprint arXiv:2405.07186, 2024
We consider the problem of estimating the average treatment effect (ATE) when both randomized control trial (RCT) data and real-world data (RWD) are available. We decompose the ATE estimand as the difference between a pooled-ATE estimand that integrates RCT and RWD and a bias estimand that captures the conditional effect of RCT enrollment on the outcome. We introduce an adaptive targeted minimum loss-based estimation (A-TMLE) framework to estimate them. We prove that the A-TMLE estimator is root-n-consistent and asymptotically normal. Moreover, in finite sample, it achieves the super-efficiency one would obtain had one known the oracle model for the conditional effect of the RCT enrollment on the outcome. Consequently, the smaller the working model of the bias induced by the RWD is, the greater our estimator’s efficiency, while our estimator will always be at least as efficient as an efficient estimator that uses the RCT data only. A-TMLE outperforms existing methods in simulations by having smaller mean-squared-error and 95% confidence intervals. A-TMLE could help utilize RWD to improve the efficiency of randomized trial results without biasing the estimates of intervention effects. This approach could allow for smaller, faster trials, decreasing the time until patients can receive effective treatments.
- Self-Calibrating Conformal PredictionLars van der Laan, and Ahmed M. AlaaarXiv preprint arXiv:2402.07307, stat.ML, 2024
In machine learning, model calibration and predictive inference are essential for producing reliable predictions and quantifying uncertainty to support decision-making. Recognizing the complementary roles of point and interval predictions, we introduce Self-Calibrating Conformal Prediction, a method that combines Venn-Abers calibration and conformal prediction to deliver calibrated point predictions alongside prediction intervals with finite-sample validity conditional on these predictions. To achieve this, we extend the original Venn-Abers procedure from binary classification to regression. Our theoretical framework supports analyzing conformal prediction methods that involve calibrating model predictions and subsequently constructing conditionally valid prediction intervals on the same data, where the conditioning set or conformity scores may depend on the calibrated predictions. Real-data experiments show that our method improves interval efficiency through model calibration and offers a practical alternative to feature-conditional validity.
- Combining T-learning and DR-learning: a framework for oracle-efficient estimation of causal contrastsLars van der Laan, Marco Carone, and Alex LuedtkearXiv preprint arXiv:2402.01972, 2024
We introduce efficient plug-in (EP) learning, a novel framework for the estimation of heterogeneous causal contrasts, such as the conditional average treatment effect and conditional relative risk. The EP-learning framework enjoys the same oracle-efficiency as Neyman-orthogonal learning strategies, such as DR-learning and R-learning, while addressing some of their primary drawbacks, including that (i) their practical applicability can be hindered by loss function non-convexity; and (ii) they may suffer from poor performance and instability due to inverse probability weighting and pseudo-outcomes that violate bounds. To avoid these drawbacks, EP-learner constructs an efficient plug-in estimator of the population risk function for the causal contrast, thereby inheriting the stability and robustness properties of plug-in estimation strategies like T-learning. Under reasonable conditions, EP-learners based on empirical risk minimization are oracle-efficient, exhibiting asymptotic equivalence to the minimizer of an oracle-efficient one-step debiased estimator of the population risk function. In simulation experiments, we illustrate that EP-learners of the conditional average treatment effect and conditional relative risk outperform state-of-the-art competitors, including T-learner, R-learner, and DR-learner. Open-source implementations of the proposed methods are available in our R package hte3.
2023
- Estimating Uncertainty in Multimodal Foundation Models using Public Internet DataShiladitya Dutta, Hongbo Wei, Lars van der Laan, and 1 more authorarXiv preprint arXiv:2310.09926, 2023
Foundation models are trained on vast amounts of data at scale using self-supervised learning, enabling adaptation to a wide range of downstream tasks. At test time, these models exhibit zero-shot capabilities through which they can classify previously unseen (user-specified) categories. In this paper, we address the problem of quantifying uncertainty in these zero-shot predictions. We propose a heuristic approach for uncertainty estimation in zero-shot settings using conformal prediction with web data. Given a set of classes at test time, we conduct zero-shot classification with CLIP-style models using a prompt template, e.g., "an image of a ", and use the same template as a search query to source calibration data from the open web. Given a web-based calibration set, we apply conformal prediction with a novel conformity score that accounts for potential errors in retrieved web data. We evaluate the utility of our proposed method in Biomedical foundation models; our preliminary results show that web-based conformal prediction sets achieve the target coverage with satisfactory efficiency on a variety of biomedical datasets.
- Adaptive debiased machine learning using data-driven model selection techniquesLars van der Laan, Marco Carone, Alex Luedtke, and 1 more authorarXiv preprint arXiv:2307.12544, 2023
Debiased machine learning estimators for nonparametric inference of smooth functionals of the data-generating distribution can suffer from excessive variability and instability. For this reason, practitioners may resort to simpler models based on parametric or semiparametric assumptions. However, such simplifying assumptions may fail to hold, and estimates may then be biased due to model misspecification. To address this problem, we propose Adaptive Debiased Machine Learning (ADML), a nonparametric framework that combines data-driven model selection and debiased machine learning techniques to construct asymptotically linear, adaptive, and superefficient estimators for pathwise differentiable functionals. By learning model structure directly from data, ADML avoids the bias introduced by model misspecification and remains free from the restrictions of parametric and semiparametric models. While they may exhibit irregular behavior for the target parameter in a nonparametric statistical model, we demonstrate that ADML estimators provides regular and locally uniformly valid inference for a projection-based oracle parameter. Importantly, this oracle parameter agrees with the original target parameter for distributions within an unknown but correctly specified oracle statistical submodel that is learned from the data. This finding implies that there is no penalty, in a local asymptotic sense, for conducting data-driven model selection compared to having prior knowledge of the oracle submodel and oracle parameter. To demonstrate the practical applicability of our theory, we provide a broad class of ADML estimators for estimating the average treatment effect in adaptive partially linear regression models.
- Causal isotonic calibration for heterogeneous treatment effectsLars van der Laan, Ernesto Ulloa-Pérez, Marco Carone, and 1 more authorIn Proceedings of the 40th International Conference on Machine Learning (ICML), 2023
We propose causal isotonic calibration, a novel nonparametric method for calibrating predictors of heterogeneous treatment effects. Furthermore, we introduce cross-calibration, a data-efficient variant of calibration that eliminates the need for hold-out calibration sets. Cross-calibration leverages cross-fitted predictors and generates a single calibrated predictor using all available data. Under weak conditions that do not assume monotonicity, we establish that both causal isotonic calibration and cross-calibration achieve fast doubly-robust calibration rates, as long as either the propensity score or outcome regression is estimated accurately in a suitable sense. The proposed causal isotonic calibrator can be wrapped around any black-box learning algorithm, providing robust and distribution-free calibration guarantees while preserving predictive performance.
- Semiparametric inference for relative heterogeneous vaccine efficacy between strains in observational case-only studiesLars van der Laan, and Peter B GilbertarXiv preprint arXiv:2303.11462, 2023
The aim of this manuscript is to explore semiparametric methods for inferring subgroup-specific relative vaccine efficacy in a partially vaccinated population against multiple strains of a virus. We consider methods for observational case-only studies with informative missingness in viral strain type due to vaccination status, pre-vaccination variables, and also post-vaccination factors such as viral load. We establish general causal conditions under which the relative conditional vaccine efficacy between strains can be identified nonparametrically from the observed data-generating distribution. Assuming that the relative strain-specific conditional vaccine efficacy has a known parametric form, we propose semiparametric asymptotically linear estimators of the parameters based on targeted (debiased) machine learning estimators for partially linear logistic regression models. Finally, we apply our methods to estimate the relative strain-specific conditional vaccine efficacy in the ENSEMBLE COVID-19 vaccine trial.
- Targeted Maximum Likelihood Based Estimation for Longitudinal Mediation AnalysisZeyi Wang, Lars van der Laan, Maya Petersen, and 3 more authorsarXiv preprint arXiv:2304.04904, 2023
2022
- Nonparametric estimation of the causal effect of a stochastic threshold-based interventionLars van der Laan, Wenbo Zhang, and Peter B GilbertBiometrics, 2022
Identifying a biomarker or treatment-dose threshold that marks a specified level of risk is an important problem, especially in clinical trials. In view of this goal, we consider a covariate-adjusted threshold-based interventional estimand, which happens to equal the binary treatment–specific mean estimand from the causal inference literature obtained by dichotomizing the continuous biomarker or treatment as above or below a threshold. The unadjusted version of this estimand was considered in Donovan et al.. Expanding upon Stitelman et al., we show that this estimand, under conditions, identifies the expected outcome of a stochastic intervention that sets the treatment dose of all participants above the threshold. We propose a novel nonparametric efficient estimator for the covariate-adjusted threshold-response function for the case of informative outcome missingness, which utilizes machine learning and targeted minimum-loss estimation (TMLE). We prove the estimator is efficient and characterize its asymptotic distribution and robustness properties. Construction of simultaneous 95% confidence bands for the threshold-specific estimand across a set of thresholds is discussed. In the Supporting Information, we discuss how to adjust our estimator when the biomarker is missing at random, as occurs in clinical trials with biased sampling designs, using inverse probability weighting. Efficiency and bias reduction of the proposed estimator are assessed in simulations. The methods are employed to estimate neutralizing antibody thresholds for virologically confirmed dengue risk in the CYD14 and CYD15 dengue vaccine trials.
- hal9001: The scalable highly adaptive lassoJeremy R Coyle, Nima S Hejazi, Rachael V Phillips, and 2 more authors2022R package version 0.4.2
2021
- Higher order targeted maximum likelihood estimationMark van der Laan, Zeyi Wang, and Lars van der LaanarXiv preprint arXiv:2101.06290, 2021