The R package colorspace provides a flexible toolbox for selecting individual colors or color palettes, manipulating these colors, and employing them in statistical graphics and data visualizations. In particular, the package provides a broad range of color palettes based on the HCL (hue-chroma-luminance) color space. The three HCL dimensions have been shown to match those of the human visual system very well, thus facilitating intuitive selection of color palettes through trajectories in this space. Using the HCL color model, general strategies for three types of palettes are implemented: (1) Qualitative for coding categorical information, i.e., where no particular ordering of categories is available. (2) Sequential for coding ordered/numeric information, i.e., going from high to low (or vice versa). (3) Diverging for coding ordered/numeric information around a central neutral value, i.e., where colors diverge from neutral to two extremes. To aid selection and application of these palettes, the package also contains scales for use with ggplot2, shiny and tcltk apps for interactive exploration, visualizations of palette properties, accompanying manipulation utilities (like desaturation and lighten/darken), and emulation of color vision deficiencies.

Zeileis A, Fisher JC, Hornik K, Ihaka R, McWhite CD, Murrell P, Stauffer R, Wilke CO (2020). “colorspace: A Toolbox for Manipulating and Assessing Colors and Palettes.” *Journal of Statistical Software*, **96**(1), 1-49. doi:10.18637/jss.v096.i01.

The release of version 2.0-0 on CRAN (Comprehensive R Archive Network) concludes more than a decade of development and substantial updates since the release of version 1.0-0. The JSS paper above gives a detailed overview of the package’s features. The full list of changes over the different release is provided in the package’s NEWS.

Even more details and links along with the full software manual are available on the package web page on R-Forge at https://colorspace.R-Forge.R-project.org/ (produced with `pkgdown`

).

The sandwich package provides model-robust covariance matrix estimators for cross-sectional, time series, clustered, panel, and longitudinal data. The implementation is modular due to an object-oriented design with support for many model objects, including: `lm`

, `glm`

, `survreg`

, `coxph`

, `mlogit`

, `polr`

, `hurdle`

, `zeroinfl`

, and beyond.

The release of version 3.0-0 on CRAN (Comprehensive R Archive Network) completes the substantial updates and improvements started in the 2.4-x and 2.5-x releases: especially clustered, panel, and bootstrap covariances. In addition to the new pkgdown web page and paper in the Journal of Statistical Software (JSS), described below, the new release includes some smaller improvements in: some equations in the vignettes (suggested by Bettina Grün and Yves Croissant), the kernel weights function `kweights()`

(suggested by Christoph Hanck), in the formula handling (suggested by David Hugh-Jones), in the `bread()`

method for weighted `mlm`

objects (suggested by James Pustejovsky). The full list of changes can be seen in the package’s NEWS.

The package comes with a dedicated `pkgdown`

website on R-Forge now: https://sandwich.R-Forge.R-project.org/. This includes a nice logo, kindly provided by Reto Stauffer.

The web page essentially uses the previous content of the package (documentation, vignettes, NEWS) but also adds a nice overview of the package to help new users to “Get started”.

*Citation:*

Zeileis A, Köll S, Graham N (2020). “Various Versatile Variances: An Object-Oriented Implementation of Clustered Covariances in R.” *Journal of Statistical Software*, **95**(1), 1-36. doi:10.18637/jss.v095.i01.

*Abstract:*

Clustered covariances or clustered standard errors are very widely used to account for correlated or clustered data, especially in economics, political sciences, and other social sciences. They are employed to adjust the inference following estimation of a standard least-squares regression or generalized linear model estimated by maximum likelihood. Although many publications just refer to ``the’’ clustered standard errors, there is a surprisingly wide variety of clustered covariances, particularly due to different flavors of bias corrections. Furthermore, while the linear regression model is certainly the most important application case, the same strategies can be employed in more general models (e.g., for zero-inflated, censored, or limited responses).

In R, functions for covariances in clustered or panel models have been somewhat scattered or available only for certain modeling functions, notably the (generalized) linear regression model. In contrast, an object-oriented approach to “robust” covariance matrix estimation - applicable beyond `lm()`

and `glm()`

- is available in the *sandwich* package but has been limited to the case of cross-section or time series data. Starting with *sandwich* 2.4.0, this shortcoming has been corrected: Based on methods for two generic functions (`estfun()`

and `bread()`

), clustered and panel covariances are provided in `vcovCL()`

, `vcovPL()`

, and `vcovPC()`

. Moreover, clustered bootstrap covariances are provided in `vcovBS()`

, using model `update()`

on bootstrap samples. These are directly applicable to models from packages including *MASS*, *pscl*, *countreg*, and *betareg*, among many others. Some empirical illustrations are provided as well as an assessment of the methods’ performance in a simulation study.

Structural equation models (SEMs) are a popular class of models, especially in the social sciences, to model correlations and dependencies in multivariate data, often involving latent variables. To account for individual heterogeneities in the SEM parameters sometimes finite-mixture models are used, in particular when there are no covariates available to explain the source of the heterogeneity. More recently, starting from the work of Brandmaier *et al.* (2013, *Psychological Methods*, doi:10.1037/a0030001) tree-based modeling of SEMs has also been receiving increasing interest in the literature. Based on available covariates SEM trees can capture the heterogeneity by recursively partitioning the data into subgroups. Brandmaier *et al.* also provide an R implementation for their algorithm in their *semtree* package available from CRAN.

Their original SEM tree algorithm relied on selecting the variables for recursive partitioning based on likelihood ratio tests along with somewhat ad hoc adjustments. Recently, the group around Brandmaier proposed to use score-based tests instead that account more formally for selecting the maximal statistic across a range of possible split points (see Arnold *et al.* 2020, PsyArXiv Preprints, doi:10.31234/osf.io/65bxv). They show that this not only improves the accuracy of the method but can also greatly alleviate the computational burden.

The score-based tests draw on the work started by us in Merkle & Zeileis (2013, *Psychometrika*, doi:10.1007/s11336-012-9302-4) which in fact had already long been available in a general model-based tree algorithm (called MOB for short), proposed by us in Zeileis *et al.* (2008, *Journal of Computational and Graphical Statistics*, doi:10.1198/106186008X319331) and available in the R package *partykit* (and *party* before that).

In this blog post I show how the general `mob()`

function from *partykit* can be easily coupled with the *lavaan* package (Rosseel 2012, *Journal of Statistical Software*, doi:10.18637/jss.v048.i02) as an alternative approach to fitting SEM trees.

MOB is a very broad tree algorithm that can capture subgroups in general parametric models (e.g., probability distributions, regression models, measurement models, etc.). While it can be applied to M-type estimators in general, it is probably easiest to outline the algorithm for maximum likelihood models. The algorithm assumes that there is some data of interest along with a suitable model that can fit the data, at least locally in subgroups. And additionally there are further covariates that can be used for splitting the data to find these subgroups. It proceeds in the following steps.

- Estimate the model parameters by maximum likelihood for the observations in the current subsample.
- Test for associations (or instabilities) of the corresponding model scores and each of the covariates available for splitting.
- Split the sample along the covariate with the strongest association or instability. Choose the breakpoint with the highest improvement in the log-likelihood.
- Repeat steps 1-3 recursively in the subsamples until these become too small or there is no significant association/instability (or some other stopping criterion is reached).
*Optionally:*Reduce size of the tree by pruning branches of splits that do not improve the model fit sufficiently (e.g., based on information criteria).

The `mob()`

function in *partykit* implements this general algorithm and allows to plug in different model-fitting functions, provided they allow to extract the estimated parameters, the maximized log-likelihood, and the corresponding matrix of score (or gradient) contributions for each observation. The details are described in a vignette within the package: Parties, Models, Mobsters: A New Implementation of Model-Based Recursive Partitioning in R.

As the *lavaan* package readily provides the quantities that MOB needs as input we can easily set up a “mobster” function for SEMs. The `lavaan_fit()`

function below takes a *lavaan* `model`

definition and returns the actual fitting function with the interface as required by `mob()`

:

```
lavaan_fit <- function(model) {
function(y, x = NULL, start = NULL, weights = NULL, offset = NULL, ..., estfun = FALSE, object = FALSE) {
sem <- lavaan::lavaan(model = model, data = y, start = start)
list(
coefficients = stats4::coef(sem),
objfun = -as.numeric(stats4::logLik(sem)),
estfun = if(estfun) sandwich::estfun(sem) else NULL,
object = if(object) sem else NULL
)
}
}
```

The fitting function just calls `lavaan()`

using the `model`

, the data `y`

, and optionally the `start`

-ing values, ignoring other arguments that `mob()`

could handle. It then extracts the parameters `coef()`

, the log-likelihood `logLik()`

, and the score matrix `estfun()`

using the generic functions from the corresponding packages and returns them in a list.

To illustrate fitting SEM trees with *partykit* and *lavaan*, we consider the example from the Using *lavaan* with *semtree* tutorial provided by Brandmaier *et al.*. It is a linear growth curve model for data measured at five time points: `X1`

, `X2`

, `X3`

, `X4`

, and `X5`

. The main parameters of interest are the intercept and the slope of the growth curves while accounting for random variations and correlations among the involved variables according to this SEM. In *lavaan* notation:

```
growth_curve_model <- '
inter =~ 1*X1 + 1*X2 + 1*X3 + 1*X4 + 1*X5;
slope =~ 0*X1 + 1*X2 + 2*X3 + 3*X4 + 4*X5;
inter ~~ vari*inter; inter ~ meani*1;
slope ~~ vars*slope; slope ~ means*1;
inter ~~ cov*slope;
X1 ~~ residual*X1; X1 ~ 0*1;
X2 ~~ residual*X2; X2 ~ 0*1;
X3 ~~ residual*X3; X3 ~ 0*1;
X4 ~~ residual*X4; X4 ~ 0*1;
X5 ~~ residual*X5; X5 ~ 0*1;
'
```

The model can also be visualized using the following graphic taken from the tutorial:

In addition to the measurements at the five time points, the data set example1.txt provides three covariates (`agegroup`

, `training`

, and `noise`

) that can be used to capture individual difference in the model parameters. The data can be read and transformed to appropriate classes by:

```
ex1 <- data.frame(read.csv(
"https://brandmaier.de/semtree/wp-content/uploads/downloads/2012/07/example1.txt",
sep = "\t"))
ex1 <- transform(ex1,
agegroup = factor(agegroup),
training = factor(training),
noise = factor(noise))
```

Given the data, model, and mobster function are available, it is easy to fit the MOB tree with SEMs in every node of the tree. The five measurements are the dependent variables (`y`

) that need to be passed to the model as a `"data.frame"`

, the three covariates are the explanatory variables:

```
library("partykit")
tr <- mob(X1 + X2 + X3 + X4 + X5 ~ agegroup + training + noise, data = ex1,
fit = lavaan_fit(growth_curve_model),
control = mob_control(ytype = "data.frame"))
```

The resulting tree `tr`

correctly detects the three subgroups that were simulated for the data by Brandmaier *et al.*. It can be visualized (with somewhat larger terminal nodes, all dropped to the bottom of the display):

```
plot(tr, drop = TRUE, tnex = 2)
```

The parameter estimates can also be extracted by `coef(tr)`

:

```
t(coef(tr))
## 2 4 5
## vari 0.086 0.080 0.105
## meani 5.020 2.003 1.943
## vars 0.500 1.627 0.675
## means -0.144 -1.082 -0.495
## cov -0.013 -0.041 0.028
## residual 0.050 0.047 0.052
## residual 0.050 0.047 0.052
## residual 0.050 0.047 0.052
## residual 0.050 0.047 0.052
## residual 0.050 0.047 0.052
```

The main parameters of interest are `meani`

, the mean intercept, and `means`

, the mean slope that both vary across the subgroups defined by `agegroup`

and `training`

: In node 2 the intercept is about 5 while in nodes 4 and 5 it is around 2. The slope is almost zero in node 2, about -1 in node 4, and about -0.5 in node 5. The `residual`

variance is restricted to be constant across the five time points and hence repeated in the output.

By extracting the node-specific `meani`

and `means`

parameters, the expected growth can also be visualized in the following way:

```
gr <- coef(tr)[, "meani"] + outer(coef(tr)[, "means"], 0:4)
cl <- palette.colors(4, "Okabe-Ito")[-1]
matplot(t(gr), type = "o", pch = 19, col = cl,
ylab = "Expected growth", xlab = "Time", xlim = c(1, 5.2))
text(5, gr[, 5], paste("Node", rownames(gr)), col = cl, pos = 3)
```

Finally, using a custom printing function that only shows the subgroup size and the first six parameters, the tree can be nicely printed as:

```
node_format <- function(node) {
c("",
sprintf("n = %s", node$nobs),
capture.output(print(cbind(node$coefficients[1:6]), digits = 2L))[-1L])
}
print(tr, FUN = node_format)
## Model-based recursive partitioning (lavaan_fit(growth_curve_model))
##
## Model formula:
## X1 + X2 + X3 + X4 + X5 ~ agegroup + training + noise
##
## Fitted party:
## [1] root
## | [2] agegroup in 0
## | n = 200
## | vari 0.086
## | meani 5.020
## | vars 0.500
## | means -0.144
## | cov -0.013
## | residual 0.050
## | [3] agegroup in 1
## | | [4] training in 0
## | | n = 100
## | | vari 0.080
## | | meani 2.003
## | | vars 1.627
## | | means -1.082
## | | cov -0.041
## | | residual 0.047
## | | [5] training in 1
## | | n = 100
## | | vari 0.105
## | | meani 1.943
## | | vars 0.675
## | | means -0.495
## | | cov 0.028
## | | residual 0.052
##
## Number of inner nodes: 2
## Number of terminal nodes: 3
## Number of parameters per node: 10
## Objective function: 1330.735
```

The main purpose of this blog post was to show that it is relatively simple to fit model-based trees with custom models using the general `mob()`

infrastructure from the *partykit* package. Specifically, *lavaan* makes it easy to fit SEM trees as the *lavaan* package readily provides all necessary components. As I had provided this as feedback to Arnold *et al.* and encouraged them to drill a bit deeper to better understand the differences between their adapted SEM tree algorithm and MOB, I thought I should share the code as it might be useful to others as well.

One important difference between the new SEM tree algorithm and the current MOB implementation is the determination of the best split point. The new SEM tree also uses the scores for this while MOB is based on the log-likelihood in the subgroups and hence is slower searching splits in numeric covariates with many possible split points. While we also had experimented with score-based split point estimation in *party* this has never been released and is currently not available in *partykit*. However, we are working on making the split point selection more flexible in *partykit*.

Of course, fitting the tree model is actually just the first step in an analysis of subgroups in a SEM. The subsequent steps for analyzing and interpreting the resulting tree model are at least as important. The work bei Brandmaier and his co-authors and their *semtree* package provide much more guidance on this.

Hofmann M, Gatu C, Kontoghiorghes EJ, Colubi A, Zeileis A (2020). “lmSubsets: Exact Variable-Subset Selection in Linear Regression for R.” *Journal of Statistical Software*, **93**(3), 1-21. doi:10.18637/jss.v093.i03

An R package for computing the all-subsets regression problem is presented. The proposed algorithms are based on computational strategies recently developed. A novel algorithm for the best-subset regression problem selects subset models based on a predetermined criterion. The package user can choose from exact and from approximation algorithms. The core of the package is written in C++ and provides an efficient implementation of all the underlying numerical computations. A case study and benchmark results illustrate the usage and the computational efficiency of the package.

https://CRAN.R-project.org/package=lmSubsets

Advances in numerical weather prediction (NWP) have played an important role in the increase of weather forecast skill over the past decades. Numerical models simulate physical systems that operate at a large, typically global, scale. The horizontal (spatial) resolution is limited by the computational power available today and hence, typically, the NWP outputs are post-processed to correct for local and unresolved effects in order to obtain forecasts for specific locations. So-called model output statistics (MOS) develops a regression relationship based on past meteorological observations of the variable to be predicted and forecasted NWP quantities at a certain lead time. Variable-subset selection is often employed to determine which NWP outputs should be included in the regression model for a specific location.

Here, the `lmSubsets`

package is used to build a MOS regression model predicting temperature at Innsbruck Airport, Austria, based on data from the Global Ensemble Forecast System. The data frame `IbkTemperature`

contains 1824 daily cases for 42 variables: the temperature at Innsbruck Airport (observed), 36 NWP outputs (forecasted), and 5 deterministic time trend/season patterns. The NWP variables include quantities pertaining to temperature (e.g., 2-meter above ground, minimum, maximum, soil), precipitation, wind, and fluxes, among others.

First, package and data are loaded and the few missing values are omitted for simplicity.

```
library("lmSubsets")
data("IbkTemperature", package = "lmSubsets")
IbkTemperature <- na.omit(IbkTemperature)
```

A simple output model for the observed temperature (`temp`

) is constructed, which will serve as the reference model. It consists of the 2-meter temperature NWP forecast (`t2m`

), a linear trend component (`time`

), as well as seasonal components with annual (`sin`

, `cos`

) and bi-annual (`sin2`

, `cos2`

) harmonic patterns.

```
MOS0 <- lm(temp ~ t2m + time + sin + cos + sin2 + cos2,
data = IbkTemperature)
```

When looking at `summary(MOS0)`

or the coefficient table below, it can be observed that despite the inclusion of the NWP variable `t2m`

, the coefficients for the deterministic components remain significant, which indicates that the seasonal temperature fluctuations are not fully resolved by the numerical model.

MOS0 | MOS1 | MOS2 | ||||

(Intercept) | -345.252 ** | (109.212) | -666.584 *** | (95.349) | -661.700 *** | (95.225) |

t2m | 0.318 *** | (0.016) | 0.055 | (0.029) | ||

time | 0.132 * | (0.054) | 0.149 ** | (0.047) | 0.147 ** | (0.047) |

sin | -1.234 *** | (0.126) | 0.522 *** | (0.147) | 0.811 *** | (0.120) |

cos | -6.329 *** | (0.164) | -0.812 ** | (0.273) | ||

sin2 | 0.240 * | (0.110) | -0.794 *** | (0.119) | -0.870 *** | (0.118) |

cos2 | -0.332 ** | (0.109) | -1.067 *** | (0.101) | -1.128 *** | (0.097) |

sshnf | 0.016 *** | (0.004) | 0.018 *** | (0.004) | ||

vsmc | 20.200 *** | (3.115) | 20.181 *** | (3.106) | ||

tmax2m | 0.145 *** | (0.037) | 0.181 *** | (0.023) | ||

st | 1.077 *** | (0.051) | 1.142 *** | (0.043) | ||

wr | 0.450 *** | (0.109) | 0.505 *** | (0.103) | ||

t2pvu | 0.064 *** | (0.011) | 0.149 *** | (0.028) | ||

mslp | -0.000 *** | (0.000) | ||||

p2pvu | -0.000 ** | (0.000) | ||||

AIC | 9493.602 | 8954.907 | 8948.182 | |||

BIC | 9537.650 | 9031.992 | 9025.267 | |||

RSS | 19506.469 | 14411.122 | 14357.943 | |||

Sigma | 3.281 | 2.825 | 2.820 | |||

R-squared | 0.803 | 0.854 | 0.855 | |||

*** p < 0.001; ** p < 0.01; * p < 0.05. |

Next, the reference model is extended with selected regressors taken from the remaining 35 NWP variables.

```
MOS1_best <- lmSelect(temp ~ ., data = IbkTemperature,
include = c("t2m", "time", "sin", "cos", "sin2", "cos2"),
penalty = "BIC", nbest = 20)
MOS1 <- refit(MOS1_best)
```

Best-subset regression with respect to the BIC criterion is employed to determine pertinent veriables in addition to the regressors already used in `MOS0`

. The 20 best submodels are computed and the selected variables can be visualized by `image(MOS1_best, hilite = 1)`

(see below) while the corresponding BIC values can be visualized by `plot(MOS1_best)`

. All in all, these 20 best models are very similar with only a few variables switching between being included and excluded. Using the `refit()`

method the best submodel can be extracted and fitted via `lm()`

. Summary statistics are shown in the table above. Overall, the model `MOS1`

improves the model fit considerably compared to the basic `MOS0`

model.

Finally, an all-subsets regression is conducted instead of the cheaper best-subsets regression. It considers all 41 variables without any restrictions to determine what is the best model in terms of BIC that could be found for this data set.

```
MOS2_all <- lmSubsets(temp ~ ., data = IbkTemperature)
MOS2 <- refit(lmSelect(MOS2_all, penalty = "BIC"))
```

Again, the best model is refitted with `lm()`

to facilitate further inspections, see above for the summary table.

The best-BIC models `MOS1`

and `MOS2`

both have 13 regressors. The deterministic trend and all but one of the harmonic seasonal components are retained in `MOS2`

even though they are not forced into the model (as in `MOS1`

). In addition, `MOS1`

and `MOS2`

share six NWP outputs relating to temperature (`tmax2m`

, `st`

, `t2pvu`

), pressure (`mslp`

, `p2pvu`

), hydrology (`vsmc`

, `wr`

), and heat flux (`sshnf`

). However, and most remarkably, `MOS1`

does not include the direct 2-meter temperature output from the NWP model (`t2m`

). In fact, `t2m`

is not included by any of the 20 submodels (sizes 8 to 27) shown by `image(MOS2_all, size = 8:27, hilite = 1, hilite_penalty = "BIC")`

whereas the temperature quantities `tmax2m`

, `st`

, `t2pvu`

are included by all. (Additionally, `plot(MOS2_all)`

would show the associated BIC and residual sum of squares across the different model sizes.) The summary statistics reveal that both `MOS1`

and `MOS2`

significantly improve over the simple reference model `MOS0`

, with `MOS2`

being only slightly better than `MOS1`

.

Lang MN, Schlosser L, Hothorn T, Mayr GJ, Stauffer R, Zeileis A (2020). *“Circular Regression Trees and Forests with an Application to Probabilistic Wind Direction Forecasting”*, arXiv:2001.00412, arXiv.org E-Print Archive. https://arXiv.org/abs/2001.00412

While circular data occur in a wide range of scientific fields, the methodology for distributional modeling and probabilistic forecasting of circular response variables is rather limited. Most of the existing methods are built on the framework of generalized linear and additive models, which are often challenging to optimize and interpret. Therefore, building on previous ideas for trees modeling circular means, we suggest a distributional approach for regression trees and random forests yielding probabilistic forecasts based on the von Mises distribution. The resulting tree-based models simplify the estimation process by using the available covariates for partitioning the data into sufficiently homogeneous subgroups so that a simple von Mises distribution without further covariates can be fitted to the circular response in each subgroup. These circular regression trees are straightforward to interpret, can capture nonlinear effects and interactions, and automatically select the relevant covariates that are associated with either location and/or scale changes in the von Mises distribution. Combining an ensemble of circular regression trees to a circular regression forest can regularize and smooth the covariate effects. The new methods are evaluated in a case study on probabilistic wind direction forecasting at two Austrian airports, considering other common approaches as a benchmark.

R package `circtree`

from the R-Forge project `partykit`

: https://R-Forge.R-project.org/R/?group_id=261

Basic examples using artificial data:

```
install.packages("partykit")
install.packages("disttree", repos = "http://R-Forge.R-project.org")
install.packages("circtree", repos = "http://R-Forge.R-project.org")
library("circtree")
example("circtree", ask = FALSE)
vignette("circtree", package = "circtree")
```

The basis for the proposed distributional modeling of the circular responses is the von Mises distribution, also known as the “circular normal distribution”. It is based on a location parameter μ in [0, 2 π) and a concentration parameter κ > 0.

The figure below illustrates a model, fitted by maximum likelihood, for circular data in the interval [0, 2 π). It can either be drawn on a linearized scale (left) or circular scale (right). In both cases the empirical histogram (gray bars) and fitted von Mises density (red line) are depicted along with the estimated location parameter (red hand).

The regression trees and forests extend this approach by employing an adaptive local likelihood approach: For each observation, the parameters μ and κ are estimated only locally in a neighborhood, defined either by the nodes of a single tree or weighted by the nodes of a forest.

To provide a first impression of the methodology in practice (motivated by air traffic management), a circular regression tree is employed for probabilistic wind direction forecasting. More specifically, we obtain 1-hourly nowcasts of wind direction at Innsbruck Airport. As the airport is located at the bottom of a narrow valley within the European Alps, it is natural to employ tree-based regression models as there can be abrupt changes in the wind direction rather than smooth changes.

Due to the short lead time only observation data is employed for predictions (41,979 data points) but no numerical weather predictions. The data is obtained from 4 stations at Innsbruck Airport as well as 6 nearby weather stations. The base variables are: Wind direction, wind (gust) speed, temperature, (reduced) air pressure, relative humidity. Based on these 260 covariates are computed via means/minima/maxima, temporal changes, and spatial differences towards the airport. The resulting regression tree is shown below along with the empirical (gray) and fitted von Mises (red) wind direction distribution in each terminal node.

Based on the fitted location parameters μ, the subgroups can be distinguished into the following wind regimes:

- Up-valley winds blowing from the valley mouth towards the upper valley (from east to west, nodes 4 and 5).
- Downslope winds blowing across the Alpine crest along the intersecting valley towards Innsbruck (from south-east to north-west, node 8).
- Down-valley winds blowing in the direction of the valley mouth (from west to east, nodes 10, 12 and 13).
- Node 7 captures observations with rather low wind speeds that cannot be clearly distinguished into specific wind regimes and are consequently associated with a very low estimated concentration parameter κ, i.e., a high estimated variance.

In terms of covariates, the lagged wind “direction” (also known as “persistence”) is mostly responsible for distinguishing the broad range of wind regimes listed above while the pressure gradients and wind speed separate the data into subgroups with high vs. low precision.

A more extensive case study of circular regression trees and also circular random forests applied to probabilistic wind direction forecasting at Innsbruck Airport and Vienna International Airport is presented in Section 4 of the paper, along with a benchmark against commonly-used alternative approaches.

]]>Umlauf N, Klein N, Simon T, Zeileis A (2019). *“bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond).”* arXiv:1909.11784, arXiv.org E-Print Archive. https://arxiv.org/abs/1909.11784

Over the last decades, the challenges in applied regression and in predictive modeling have been changing considerably: (1) More flexible model specifications are needed as big(ger) data become available, facilitated by more powerful computing infrastructure. (2) Full probabilistic modeling rather than predicting just means or expectations is crucial in many applications. (3) Interest in Bayesian inference has been increasing both as an appealing framework for regularizing or penalizing model estimation as well as a natural alternative to classical frequentist inference. However, while there has been a lot of research in all three areas, also leading to associated software packages, a modular software implementation that allows to easily combine all three aspects has not yet been available. For filling this gap, the R package bamlss is introduced for Bayesian additive models for location, scale, and shape (and beyond). At the core of the package are algorithms for highly-efficient Bayesian estimation and inference that can be applied to generalized additive models (GAMs) or generalized additive models for location, scale, and shape (GAMLSS), also known as distributional regression. However, its building blocks are designed as “Lego bricks” encompassing various distributions (exponential family, Cox, joint models, …), regression terms (linear, splines, random effects, tensor products, spatial fields, …), and estimators (MCMC, backfitting, gradient boosting, lasso, …). It is demonstrated how these can be easily recombined to make classical models more flexible or create new custom models for specific modeling challenges.

CRAN package: https://CRAN.R-project.org/package=bamlss

Replication script: bamlss.R

Project web page: http://www.bamlss.org/

To illustrate that the `bamlss`

follows the same familiar workflow of the other regression packages such as the basic `stats`

package or the well-established `mgcv`

or `gamlss`

two quick examples are provided: a Bayesian logit model and a location-scale model where both mean and variance of a normal response depend on a smooth term.

The logit model is a basic labor force participation model, a standard application in microeconometrics. Here, the data are loaded from the `AER`

package and the same model formula is specified that would also be used for `glm()`

(as shown on `?SwissLabor`

).

```
data("SwissLabor", package = "AER")
f <- participation ~ income + age + education + youngkids + oldkids + foreign + I(age^2)
```

Then, the model can be estimated with `bamlss()`

using essentially the same look-and-feel as for `glm()`

. The default is to use Markov chain Monte Carlo after obtaining initial parameters via backfitting.

```
library("bamlss")
set.seed(123)
b <- bamlss(f, family = "binomial", data = SwissLabor)
summary(b)
## Call:
## bamlss(formula = f, family = "binomial", data = SwissLabor)
## ---
## Family: binomial
## Link function: pi = logit
## *---
## Formula pi:
## ---
## participation ~ income + age + education + youngkids + oldkids +
## foreign + I(age^2)
## -
## Parametric coefficients:
## Mean 2.5% 50% 97.5% parameters
## (Intercept) 6.15503 1.55586 5.99204 11.11051 6.196
## income -1.10565 -1.56986 -1.10784 -0.68652 -1.104
## age 3.45703 2.05897 3.44567 4.79139 3.437
## education 0.03354 -0.02175 0.03284 0.09223 0.033
## youngkids -1.17906 -1.51099 -1.17683 -0.83047 -1.186
## oldkids -0.24122 -0.41231 -0.24099 -0.08054 -0.241
## foreignyes 1.16749 0.76276 1.17035 1.55624 1.168
## I(age^2) -0.48990 -0.65660 -0.49205 -0.31968 -0.488
## alpha 0.87585 0.32301 0.99408 1.00000 NA
## ---
## Sampler summary:
## -
## DIC = 1033.325 logLik = -512.7258 pd = 7.8734
## runtime = 1.417
## ---
## Optimizer summary:
## -
## AICc = 1033.737 converged = 1 edf = 8
## logLik = -508.7851 logPost = -571.3986 nobs = 872
## runtime = 0.012
## ---
```

The summary is based on the MCMC samples, which suggest “significant” effects for all covariates, except for variable `education`

, since the 95% credible interval contains zero. In addition, the acceptance probabilities `alpha`

are reported and indicate proper behavior of the MCMC algorithm. The column `parameters`

shows respective posterior mode estimates of the regression coefficients, which are calculated by the upstream backfitting algorithm.

To show a more flexible regression model we fit a distributional scale-location model to the well-known simulated motorcycle accident data, provided as `mcycle`

in the `MASS`

package.

Here, the relationship between head acceleration and time after impact is captured by smooth relationships in both mean and variance. See also `?gaulss`

in the `mgcv`

package for the same type of model estimated with REML rather than MCMC. Here, we load the data, set up a list of two formula with smooth terms (and increased knots `k`

for more flexibility), fit the model almost as usual, and then visualize the fitted terms along with 95% credible intervals.

```
data("mcycle", package = "MASS")
f <- list(accel ~ s(times, k = 20), sigma ~ s(times, k = 20))
set.seed(456)
b <- bamlss(f, data = mcycle, family = "gaussian")
plot(b, model = c("mu", "sigma"))
```

Finally, we show a more challenging case study. Here, emphasis is given to the illustration of the workflow. For more details on the background for the data and interpretation of the model, see Section 5 in the full paper linked above. The goal is to establish a probabilistic model linking positive counts of cloud-to-ground lightning discharges in the European Eastern Alps to atmospheric quantities from a reanalysis dataset.

The lightning measurements form the response variable and regressors are taken from the atmospheric quantities from ECMWF’s ERA5 reanalysis data. Both have a temporal resolution of 1 hour for the years 2010-2018 and a spatial mesh size of approximately 32 km. The subset of the data analyzed along with the fitted `bamlss`

model are provided in the `FlashAustria`

data on R-Forge which can be installed by

```
install.packages("FlashAustria", repos = "http://R-Forge.R-project.org")
```

To model only the lightning counts with at least one lightning discharge we employ a negative binomial count distribution, truncated at zero. The data can be loaded as follows and the regression formula set up:

```
data("FlashAustria", package = "FlashAustria")
f <- list(
counts ~ s(d2m, bs = "ps") + s(q_prof_PC1, bs = "ps") +
s(cswc_prof_PC4, bs = "ps") + s(t_prof_PC1, bs = "ps") +
s(v_prof_PC2, bs = "ps") + s(sqrt_cape, bs = "ps"),
theta ~ s(sqrt_lsp, bs = "ps")
)
```

The expectation `mu`

of the underlying untruncated negative binomial model is modeled by various smooth terms for the atmospheric variables while the overdispersion parameter `theta`

only depends on one smooth regressor. To fit this challenging model, gradient boosting is employed in a first step to obtain initial values for the subsequent MCMC sampler. Running the model takes about 30 minutes on a well-equipped standard PC. In order to move quickly through the example we load the pre-computed model from the `FlashAustria`

package:

```
data("FlashAustriaModel", package = "FlashAustria")
b <- FlashAustriaModel
```

But, of course, the model can also be refitted:

```
set.seed(111)
b <- bamlss(f, family = "ztnbinom", data = FlashAustriaTrain,
optimizer = boost, maxit = 1000, ## Boosting arguments.
thin = 5, burnin = 1000, n.iter = 6000) ## Sampler arguments.
```

To explore this model in some more detail, we show a couple of visualizations. First, the contribution to the log-likelihood of individual terms during gradient boosting is depicted.

```
pathplot(b, which = "loglik.contrib", intercept = FALSE)
```

Subsequently, we show traceplots of the MCMC samples (left) along with autocorrelation for two splines the term `s(sqrt_cape)`

of the model for `mu`

.

```
plot(b, model = "mu", term = "s(sqrt_cape)", which = "samples")
```

Next, the effects of the terms `s(sqrt_cape)`

and `s(q_prof_PC1)`

from the model for `mu`

and term `s(sqrt_lsp)`

from the model for `theta`

are shown along with 95% credible intervals derived from the MCMC samples.

```
plot(b, term = c("s(sqrt_cape)", "s(q_prof_PC1)", "s(sqrt_lsp)"),
rug = TRUE, col.rug = "#39393919")
```

Finally, estimated probabilities for observing 10 or more lightning counts (within one grid box) are computed and visualized. The reconstructions for four time points on September 15-16, 2001 are shown.

```
fit <- predict(b, newdata = FlashAustriaCase, type = "parameter")
fam <- family(b)
FlashAustriaCase$P10 <- 1 - fam$p(9, fit)
world <- rnaturalearth::ne_countries(scale = "medium", returnclass = "sf")
library("ggplot2")
ggplot() + geom_sf(aes(fill = P10), data = FlashAustriaCase) +
colorspace::scale_fill_continuous_sequential("Oslo", rev = TRUE) +
geom_sf(data = world, col = "white", fill = NA) +
coord_sf(xlim = c(7.95, 17), ylim = c(45.45, 50), expand = FALSE) +
facet_wrap(~time, nrow = 2) + theme_minimal() +
theme(plot.margin = margin(t = 0, r = 0, b = 0, l = 0))
```

]]>Over the last week a big controversy over Hurricane Dorian emerged after US President Donald Trump tweeted on September 1 that Alabama (and other states) “will most likely be hit (much) harder than anticipated”. And after the Birmingham, Alabama, office of the National Weather Service contradicted Trump on Twitter, the US president defended his tweet claiming that earlier forecasts showed a high probability of Alabama being hit. The various pieces of “evidence” for this included a map, manually modified by a marker, leading to the hashtag #sharpiegate trending on Twitter.

Here, we won’t comment further on the controversy as it is undisputed among scientists that on September 1 the forecast path did not include Alabama. However, we will look into the maps that Trump claimed his tweet was based on and we will investigate wether poor color choice may have contributed to a misinterpretation of the maps. Specifically, on September 5 Trump tweeted:

Just as I said, Alabama was originally projected to be hit. The Fake News denies it! pic.twitter.com/elJ7ROfm2p

— Donald J. Trump (@realDonaldTrump) September 5, 2019

These maps convey the impression that there is an increased risk for Alabama and especially the three maps with the color coding are rather suggestive. A closer look, though, reveals that the maps are from August 30, have a 5-day forecasting horizon, and pertain to probabilities for tropical-storm-force winds (i.e., not the cone of the hurricane!), with South-East Alabama only having a 5-20% probability.

Although the information in the maps can be correctly decoded using their titles and legends, it can be argued that this may require some expertise or experience and that there is some potential for misinterpretations. For example, data visualization expert Alberto Cairo writes on Twitter: *“I just want to give him the benefit of the doubt, honestly. These maps are difficult to understand. For me the bad thing isn’t misinterpreting. It’s not apologizing […]”*

And one aspect that makes the maps prone to misinterpretations is the color choice for coding the probabilities. This is a so-called “rainbow color map” going from dark green over bright yellow to red and dark purple. Such color maps are still widely used although it has been widely recognized that they have a number of disadvantages. In the following, Reto Stauffer and I illustrate in detail what the specific problems of the top right map are and suggest a better alternative color choice.

On the left is the original map that was included in Trump’s tweet and on the right is our version with alternative colors. The main problem with the original colors is that the entire area with more than 5% probability is shaded with highly-saturated colors. Some would argue that the traffic light system (green-yellow-red) signals that the green areas are relatively “low risk”. However, we argue that the bright colors and the abrupt transition from “no color” (less than 5%) to “dark green” (for 5-10%) conveys a substantially increased risk for the entire shaded area.

One way to avoid this misinterpretation is to choose colors that go from light (low risk) to dark and colorful (high risk). This is what we have done in the map on the right - while preserving the hues from green over yellow and red to purple. The probabibilities represented in the map are exactly the same but the alternative color choice conveys much more intuitively which areas are affected by increased probabilities beyond 50% or 60% (which do not include Alabama).

In summary, the information in the map certainly does not represent strong evidence for Alabama being likely “hit hard” by Hurricane Dorian. However, the poor color choice facilitates such misinterpretations and better, more intuitive color alternatives are easily available.

Further problems with the original colors can be brought out by converting both maps to grayscale. This shows that not only the transition from below to above 5% is emphasized too much but also the discontinuous transitions between dark and light are very counterintuitive. In contrast, our alternative colors are much more intuitive because they become darker with increasing risk.

Another related problem can be demonstrated by emulating green-deficient vision (deuteranopia), also showing discontinuities in the original colors.

Finally, we briefly comment on some technical details for contructing the alternative color map. We have used our R software package colorspace that facilitates choosing color palettes using the HCL color model, that captures the perceptual dimensions “hue” (type of color, dominant wavelength), “chroma” (colorfulness), and “luminance” (brightness). In the two plots below we show the HCL spectrum of both sets of colors.

For the original colors on the left we see that luminance (blue line) is non-monotonic, chroma (green line) is high throughout, and hue (red line) goes from green to purple. For our alternative colors we have used essentially the same hues. However, luminance covers a similar range as in the original colors but in a monotonic fashion. And chroma is low for colors associated with low risk.

The R code snippet below shows how the alternative colors can be computed using our `colorspace`

package:

```
colorspace::sequential_hcl(10, palette = "Purple-Yellow", rev = TRUE,
c1 = 70, cmax = 100, l2 = 80, h2 = 500)
```

The starting point is the sequential `Purple-Yellow`

palette that we have used previously for risk maps. However, we modify the low-risk hue from yellow to green (hue = 140) and go in the opposite direction through the color wheel (hence hue = 500 = 140 + 360 is used). Moreover, we increase chroma for the high-risk colors and decrease luminance somewhat for the low-risk colors (to be of similar brightness as the gray map in the background). Further similar illustrations of problems with rainbow color maps are available along with more details and explanations on our web site http://colorspace.R-Forge.R-project.org/articles/endrainbow.html.

*(Authors: Achim Zeileis, Jason C. Fisher, Kurt Hornik, Ross Ihaka, Claire D. McWhite, Paul Murrell, Reto Stauffer, Claus O. Wilke)*

The R package “colorspace” (http://colorspace.R-Forge.R-project.org/) provides a flexible toolbox for selecting individual colors or color palettes, manipulating these colors, and employing them in statistical graphics and data visualizations. In particular, the package provides a broad range of color palettes based on the HCL (Hue-Chroma-Luminance) color space. The three HCL dimensions have been shown to match those of the human visual system very well, thus facilitating intuitive selection of color palettes through trajectories in this space.

Namely, general strategies for three types of palettes are provided: (1) Qualitative for coding categorical information, i.e., where no particular ordering of categories is available and every color should receive the same perceptual weight. (2) Sequential for coding ordered/numeric information, i.e., going from high to low (or vice versa). (3) Diverging for coding ordered/numeric information around a central neutral value, i.e., where colors diverge from neutral to two extremes.

To aid selection and application of these palettes the package provides scales for use with ggplot2; shiny (and tcltk) apps for interactive exploration (see also http://hclwizard.org/); visualizations of palette properties; accompanying manipulation utilities (like desaturation and lighten/darken), and emulation of color vision deficiencies.

Links to: PDF slides, YouTube video, R code, arXiv working paper.

Furthermore, replication code for the introductory example (influenza risk map) was already provided in the recent endrainbow blog post.

]]>Jones PJ, Mair P, Simon T, Zeileis A (2019). *“Network Model Trees”*, OSF ha4cw, OSF Preprints. doi:10.31219/osf.io/ha4cw

In many areas of psychology, correlation-based network approaches (i.e., psychometric networks) have become a popular tool. In this paper we define a statistical model for correlation-based networks and propose an approach that recursively splits the sample based on covariates in order to detect significant differences in the network structure. We adapt model-based recursive partitioning and conditional inference tree approaches for finding covariate splits in a recursive manner. This approach is implemented in the networktree R package. The empirical power of these approaches is studied in several simulation conditions. Examples are given using real-life data from personality and clinical research.

CRAN package: https://CRAN.R-project.org/package=networktree

OSF project: https://osf.io/ykq2a/

Network model trees are illustrated using data from the Open Source Psychometrics Project:

- Ten Item Personality Inventory (TIPI), a brief inventory of the Big Five personality domains (extraversion, neuroticism, conscientiousness, agreeableness, and openness to experience). Each personality domain is assessed with two items with one item measured normally and the other in reverse.
- Depression Anxiety and Stress Scale (DASS), a self-report instrument for measuring depression, anxiety, and tension or stress.

The TIPI network is partitioned using MOB based on three covariates: engnat (English as native language), gender, and education. Generally, the structure of the network is characterized by strong negative relationships between the normal and reverse measurements of each domain with complex relationships between separate domains. When partitioning the network interesting differences are revealed. For example, native English speakers without a university degree showed a negative relationship between agreeableness and agreeableness-reversed that was significantly weakened in non-native speakers and in native speakers with a university degree. Among native English speakers with a university degree, males and other genders showed a stronger relationship between conscientiousness and neuroticism-reversed compared to females.

In the network plots edge thicknesses are determined by the strength of regularized partial correlations between nodes. Node labels correspond to the first letter of each Big Five personality domain, with the character “r” indicating items that measure the domain in reverse.

The DASS network is partitioned using MOB based on a larger variety of covariates in a highly exploratory scenario: engnat (Engligh as native language), gender, marital status, sexual orientation, and race. Again, the primary split occurred between native and non-native English speakers. Among native English speakers, two further splits were found with the race variable. Among the non-native English speakers, a split was found by gender. These results indicate various sources of potential heterogeneity in network structure. For example, among non-native speakers, the connection between worthlife (I felt that life wasn’t worthwhile) and nohope (I could see nothing in the future to be hopeful about) was stronger compared to females and other genders. In native English speaking Asians, the connection between getgoing (I just couldn’t seem to get going) and lookforward (I felt that I had nothing to look forward to) was stronger compared to all other racial groups.

]]>Schlosser L, Hothorn T, Zeileis A (2019). *“The Power of Unbiased Recursive Partitioning: A Unifying View of CTree, MOB, and GUIDE”*, arXiv:1906.10179, arXiv.org E-Print Archive. https://arXiv.org/abs/1906.10179

A core step of every algorithm for learning regression trees is the selection of the best splitting variable from the available covariates and the corresponding split point. Early tree algorithms (e.g., AID, CART) employed greedy search strategies, directly comparing all possible split points in all available covariates. However, subsequent research showed that this is biased towards selecting covariates with more potential split points. Therefore, unbiased recursive partitioning algorithms have been suggested (e.g., QUEST, GUIDE, CTree, MOB) that first select the covariate based on statistical inference using p-values that are adjusted for the possible split points. In a second step a split point optimizing some objective function is selected in the chosen split variable. However, different unbiased tree algorithms obtain these p-values from different inference frameworks and their relative advantages or disadvantages are not well understood, yet. Therefore, three different popular approaches are considered here: classical categorical association tests (as in GUIDE), conditional inference (as in CTree), and parameter instability tests (as in MOB). First, these are embedded into a common inference framework encompassing parametric model trees, in particular linear model trees. Second, it is assessed how different building blocks from this common framework affect the power of the algorithms to select the appropriate covariates for splitting: observation-wise goodness-of-fit measure (residuals vs. model scores), dichotomization of residuals/scores at zero, and binning of possible split variables. This shows that specifically the goodness-of-fit measure is crucial for the power of the procedures, with model scores without dichotomization performing much better in many scenarios.

CRAN package: https://CRAN.R-project.org/package=partykit

Development version with some extensions enabled: partykit_1.2-4.2.tar.gz

Replication materials: simulation.zip

The manuscript compares three so-called unbiased recursive partitioning algorithms that employ statistical inference to adjust for the number of possible splits in a split variable: GUIDE (Loh 2002), CTree (Hothorn *et al.* 2006), MOB (Zeileis *et al.* 2008).

First, it is pointed out what the similarities and the differences in the algorithms are, specifically with respect to the split variable selection through statistical tests. Second, the power of these tests is studied for a “stump”, i.e., a single split only. Third, the capability of the entire algorithm (including a pruning strategy) to recover the correct partition in a “tree” with two splits is investigated.

In all cases, the three algorithms are employed to learn *model-based* trees where in each leaf of the tree a linear regression model is fitted with intercept β_{0} and slope β_{1}. The simulations then vary whether only the intercept β_{0} or the slope β_{1} or both differ in the data.

All three algorithms proceed by first fitting the model (here: linear regression by OLS) in a given subgroup (or node) of the tree. Then they extract some kind of goodness-of-fit measure (either residuals or full model scores) and test whether this measure is associated with any of the split variables. The variable with the highest association (i.e., lowest p-value) is employed for splitting and then the procedure is repeated recursively in the resulting subgroups.

For “pruning” the tree to the right size one can either first grow a larger tree and then prune those splits that are not relevant enough (post-pruning). Or the algorithm can stop splitting when the association test is not significant anymore (pre-pruning).

The default combinations of fitted model type, test type, and pruning strategy for the three algorithms are given in the following table.

Algorithm | Fit | Test | Pruning |
---|---|---|---|

CTree | Non-parametric | Conditional inference | Pre |

MOB | Parametric | Score-based fluctuation | Pre (or post with AIC/BIC) |

GUIDE | Parametric | Residual-based chi-squared | Post (cost-complexity pruning) |

Thus, the main difference is the testing strategy but also the pruning is relevant. While at first sight, the tests come from very different motivations, they are actually not that different. When assessing the association with the split variable the following three properties are most relevant:

*Goodness-of-fit measure:*Either the full model scores which are here bivariate with one component for the intercept (= residuals) and one component for the slope. Alternatively, only the residuals are used.*Dichotomization of residuals/scores:*Are the numeric values used for the residuals/scores? Or are they dichotomized at zero?*Categorization of split variables:*Are the numeric values used for the split variables? Or are they binned at the quartiles?

An overview of the corresponding settings for the three algorithms is given in the following table. Additionally, the tests differ somewhat in how they aggregate across the possible splits considered. Either in a sum-of-squares statistic or a maximally-selected statistic.

Algorithm | Scores | Dichotomization | Categorization | Statistic |
---|---|---|---|---|

CTree | Model scores | – | – | Sum of squares |

MOB | Model scores | – | – | Maximally selected |

GUIDE | Residuals | X | X | Sum of squares |

Subsequently, these algorithms are compared in two simulation studies. More details and more simulation studies can be found in the manuscript. In addition to the three default algorithms, a modified GUIDE algorithm using model scores instead of residuals (GUIDE+scores) is considered.

Clearly, the different choices made in the construction influence the inference properties of the significance tests. Hence, in a first step we investigate the power properties of the tests when there is only one split in one of the split variables (among further noise variables). The split can pertain either to the intercept β_{0} only or the slope β_{1} only or both.

The plot below shows the probability of selection the true split variable (Z_{1}) with the minimal p-value against the magnitude of the difference in the regression coefficients (δ). For a split in the middle of the data (50%) pertaining only to the intercept β_{0} (top left panel) all tests perform almost equivalently. However, if the split only affects the slope β_{1} (middle column) it is much better to use score-based tests rather than residual-based tests (as in GUIDE) which cannot pick up changes that do not affect the conditional mean. Moreover, if the split occurs not in the middle (50% quantile, top row) but in the tails (90% quantile, bottom row) it is better to use a maximally-selected statistic (as in MOB) rather than a sum-of-squares statistic.

One could argue that the power properties of the tests may be crucial when pre-pruning (based on statistical significance) is used. However, when combined with cost-complexity post-pruning it may not be so important to have particularly high power. As long as the power for the true split variables is higher than for the noise variables, it might be sufficient to select the correct split variable.

This is assessed in a simulation for a tree with two splits, both depending on differences of magnitude δ in the two regression coefficients, respectively. The adjusted Rand index is used to assess how well the partition found by the tree conforms with the true partition. The columns of the display below are for splits that occur in the middle of the data vs. later in the sample (left to right).

And indeed it can be shown that post-pruning (bottom row) mitigates many of the power deficits of the testing strategies compared to significance-based pre-pruning (top row). However, it is still clearly better to use a score based test (as in CTree, MOB, and GUIDE+scores) than a residuals-based test (as in GUIDE). Also, pre-pruning may even lead to slightly better results than post-pruning when based on a powerful test.

Using several simulation setups we have shown that in many circumstances CTree, MOB, and GUIDE perform very similarly for recursive partitioning based on linear regression models. However, in some settings score-based clearly outperform residual-based tests (the latter may even lack power altogether). To some extent cross-complexity post-pruning can mitigate power deficits of the testing strategy but pre-pruning typically works as well as long as the significance test works well.

Furthermore, other simulations in the manuscript show that dichotomization of residuals/scores should be avoided as it reduces the power of the tests. Note that this is very easy to do in GUIDE: Instead of chi-squared tests one can simply use one-way ANOVA tests. Finally, in the appendix of the manuscript it is shown that maximally-selected statistics (as in MOB) work better for abrupt splits late in the sample while the sum-of-squares statistics (from CTree and GUIDE) work better for smooth(er) transitions.

]]>