Model-based causal forests for heterogeneous treatment effects

A new arXiv paper investigates which building blocks of random forests, especially causal forests and model-based forests, make them work for heterogeneous treatment effect estimation, both in randomized trials and observational studies.


Susanne Dandl, Torsten Hothorn, Heidi Seibold, Erik Sverdrup, Stefan Wager, Achim Zeileis (2022). “What Makes Forest-Based Heterogeneous Treatment Effect Estimators Work?.” E-Print Archive arXiv:2206.10323 [stat.ME]. doi:10.48550/arXiv.2206.10323


Estimation of heterogeneous treatment effects (HTE) is of prime importance in many disciplines, ranging from personalized medicine to economics among many others. Random forests have been shown to be a flexible and powerful approach to HTE estimation in both randomized trials and observational studies. In particular “causal forests”, introduced by Athey, Tibshirani, and Wager (2019), along with the R implementation in package grf, were rapidly adopted. A related approach, called “model-based forests”, that is geared towards randomized trials and simultaneously captures effects of both prognostic and predictive variables, was introduced by Seibold, Zeileis, and Hothorn (2018) along with a modular implementation in the R package model4you.

Here, we present a unifying view that goes beyond the theoretical motivations and investigates which computational elements make causal forests so successful and how these can be blended with the strengths of model-based forests. To do so, we show that both methods can be understood in terms of the same parameters and model assumptions for an additive model under L2 loss. This theoretical insight allows us to implement several flavors of “model-based causal forests” and dissect their different elements in silico.

The original causal forests and model-based forests are compared with the new blended versions in a benchmark study exploring both randomized trials and observational settings. In the randomized setting, both approaches performed akin. If confounding was present in the data generating process, we found local centering of the treatment indicator with the corresponding propensities to be the main driver for good performance. Local centering of the outcome was less important, and might be replaced or enhanced by simultaneous split selection with respect to both prognostic and predictive effects. This lays the foundation for future research combining random forests for HTE estimation with other types of models.

We demonstrate the practical aspects of such a model-agnostic approach to HTE estimation analyzing the effect of cesarean section on postpartum blood loss in comparison to vaginal delivery. Clearly, randomization is hardly possible in this setup, and we present a tailored model-based forest for skewed and interval-censored data to infer possible predictive variables and their impact on the treatment effect.

Benchmark study

To investigate which elements of the different random forest algorithms in causal forests (cf) vs. model-based forests (mob) contribute to more precise estimation of heterogeneous treatment effects, a large simulation experiment was carried out, using normal outcomes, different predictive and prognostic effects, and a varying number of observations (N) and covariates (P).

In addition to the original cf (from grf) and mob (from model4you) algorithms three blended versions (based on model4you) were assessed: mob(\(\widehat W\)) (model-based forests after centering of the treatment indicator), mob(\(\widehat W\), \(\widehat Y\)) (model-based forests after centering of both the treatment indicator and the outcome), mobcf (model-based forests after centering of both the treatment indicator and the outcome, only testing for splits in the treatment effect).

Four data-generation setups are considered, as proposed by Nie and Wager (2021): Setup A has complicated confounding but a relatively simple treatment effect function. Setup B has no confounding. Setup C has strong confounding but a constant treatment effect. In Setup D the treatment and control arms are completely unrelated.

Overall, the results in the figure below show that centering of the treatment indicator as in mob(\(\widehat W\)) is the most relevant ingredient to random forests for HTE estimation in observational studies. If possible, additional centering the outcome in combination with simultaneous estimation of predictive and prognostic effects in mob(\(\widehat W\), \(\widehat Y\)) is recommended as it always performs as well as mob(\(\widehat W\)) and mobcf but may yield relevant improvements in some scenarios. Other technical aspects of tree and forest induction did not contribute to major performance differences. The overall strong performance of mob(\(\widehat W\), \(\widehat Y\)), combining centering of outcome and treatment from causal forests with joint estimation of prognostic and predictive effects, suggests that alternative split criteria sensitive to both intercepts and treatment effects might be able to improve the performance of causal forests.

Results for the experimental setups in Section 4.1 of the arXiv working paper. Direct comparison of the adaptive versions of causal forests, model-based forests without centering (mob), mob imitating causal forests (mobcf), mob with centered W (mob(W)) and additional of Y (mob(W, Y)).

For more details and more results see the arXiv working paper.

Empirical application

To illustrate how model-based causal forests can be tailored for specific situations, the effect of cesarean sections vs. vaginal deliveries (treatment) on the amount of postpartum blood loss (outcome) is invectigated. Clearly, covariates like maternal age, birth weight, gestational age, or multifetal pregnancy potentially have an impact on both the treatment and the outcome. As randomizing the mode of delivery is impossible, methods for HTE estimation from observational data are needed. Moreover, blood loss is a skewed variable that is additionally impossible to measure exactly in the sometimes hectic environment of a delivery ward. It is hence treated as interval-censored. To accomodate all these features, a model-based causal forest is fitted by using pmforest() from model4you in combination with:

  • Centering of the treatment variable to account for the observational nature of the data.
  • A transformation model (based on a Bernstein polynomial) to flexibly capture the skewness of the outcome variable.
  • Interval censoring of the outcome observations.

The dependency of the treatment effect on the prepartum variables is visualized in the figure below, using scatter plots for continuous covariates and boxplots for categorical covariates. While some variables have virtually no influence on the treatment effect (e.g., mother’s age), others are associated with clear effect differences. In particular, higher gestational age, higher neonatal weight, and no multifetal pregnancy have a higher risk for elevated blood loss due to cesarean section compared to vaginal delivery.

Dependency plots of the individual treatment effects calculated by the model-based transformation forest. Values > 0 mean that cesarean section increases the blood loss compared to vaginal delivery. Blue lines and diamond points depict (smooth conditional) mean effects.

For more details see the arXiv working paper.