glmertree {glmertree}R Documentation

(Generalized) Linear Mixed Model Trees

Description

Model-based recursive partitioning based on (generalized) linear mixed models.

Usage

lmertree(formula, data, weights = NULL, cluster = NULL, 
  ranefstart = NULL, offset = NULL, joint = TRUE, 
  abstol = 0.001, maxit = 100, dfsplit = TRUE, verbose = FALSE, 
  plot = FALSE, REML = TRUE, lmer.control = lmerControl(), ...)

glmertree(formula, data, family = "binomial", weights = NULL,
  cluster = NULL, ranefstart = NULL, offset = NULL, joint = TRUE,
  abstol = 0.001, maxit = 100, dfsplit = TRUE, verbose = FALSE, 
  plot = FALSE, nAGQ = 1L, glmer.control = glmerControl(), ...)

Arguments

formula

formula specifying the response variable and a three-part right-hand-side describing the regressors, random effects, and partitioning variables, respectively. For details see below.

data

data.frame to be used for estimating the model tree.

family

family specification for glmtree and glmer. See glm documentation for families.

weights

numeric. An optional numeric vector of weights. Can be a name of a column in data or a vector of length nrow(data).

cluster

optional vector of cluster IDs to be employed for clustered covariances in the parameter stability tests. Can be a name of a column in data or a vector of length nrow(data). If cluster = NULL (the default), observation-level covariances are employed in the parameter stability tests. If partitioning variables are measured on the cluster level, this can be accounted for by specifying the name of the cluster argument here, as a result cluster-level covariances will be employed in the parameter stability tests.

ranefstart

NULL (the default), TRUE, or a numeric vector of length nrow(data). Specifies the offset to be used in estimation of the first tree. NULL by default, yielding a zero offset initialization. If ranefstart = TRUE is specified, the random effects will be estimated first and the first tree will be grown using the random-effects predictions as an offset.

offset

optional numeric vector to be included in the linear predictor with a coeffcient of one. Note that offset can be a name of a column in data or a a numeric vector of length nrow(data).

joint

logical. Should the fixed effects from the tree be (re-)estimated jointly along with the random effects?

abstol

numeric. The convergence criterion used for estimation of the model. When the difference in log-likelihoods of the random-effects model from two consecutive iterations is smaller than abstol, estimation of the model tree has converged.

maxit

numeric. The maximum number of iterations to be performed in estimation of the model tree.

dfsplit

logical or numeric. as.integer(dfsplit) is the degrees of freedom per selected split employed when extracting the log-likelihood.

verbose

Should the log-likelihood value of the estimated random-effects model be printed for every iteration of the estimation?

plot

Should the tree be plotted at every iteration of the estimation? Note that selecting this option slows down execution of the function.

REML

logical scalar. Should the fixed-effects estimates be chosen to optimize the REML criterion (as opposed to the log-likelihood)? Will be passed to funtion lmer(). See lmer for details.

nAGQ

integer scalar. Specifies the number of points per axis for evaluating the adaptive Gauss-Hermite approximation to the log-likelihood, to be passed to function glmer(). See glmer for details.

lmer.control, glmer.control

list. An optional list with control parameters to be passed to lmer() or glmer(), respectively. See lmerControl for details.

...

Additional arguments to be passed to lmtree() or glmtree(). See mob_control documentation for details.

Details

(G)LMM trees learn a tree where each terminal node is associated with different fixed-effects regression coefficients while adjusting for global random effects (such as a random intercept). This allows for detection of subgroups with different fixed-effects parameter estimates, keeping the random effects constant throughout the tree (i.e., random effects are estimated globally). The estimation algorithm iterates between (1) estimation of the tree given an offset of random effects, and (2) estimation of the random effects given the tree structure. See Fokkema et al. (2018) for a detailed introduction.

To specify all variables in the model a formula such as y ~ x1 + x2 | random | z1 + z2 + z3 is used, where y is the response, x1 and x2 are the regressors in every node of the tree, random is the random effects, and z1 to z3 are the partitioning variables considered for growing the tree. If random is only a single variable such as id a random intercept with respect to id is used. Alternatively, it may be an explicit random-effects formula such as (1 | id) or a more complicated formula such as ((1+time) | id). (Note that in the latter two formulas, the brackets are necessary to protect the pipes in the random-effects formulation.)

In the random-effects model from step (2), two strategies are available: Either the fitted values from the tree can be supplied as an offset (joint = FALSE) so that only the random effects are estimated. Or the fixed effects are (re-)estimated along with the random effects using a nesting factor with nodes from the tree (joint = TRUE). In the former case, the estimation of each random-effects model is typically faster, but more iterations are required.

The code is still under development and might change in future versions.

Value

The function returns a list with the following objects:

tree

The final lmtree/glmtree.

lmer

The final lmer random-effects model.

ranef

The corresponding random effects of lmer.

varcorr

The corresponding VarCorr(lmer).

variance

The corresponding attr(VarCorr(lmer), "sc")^2.

data

The dataset specified with the data argument including added auxiliary variables .ranef and .tree from the last iteration.

loglik

The log-likelihood value of the last iteration.

iterations

The number of iterations used to estimate the lmertree.

maxit

The maximum number of iterations specified with the maxit argument.

ranefstart

The random effects used as an offset, as specified with the ranefstart argument.

formula

The formula as specified with the formula argument.

randomformula

The formula as specified with the randomformula argument.

abstol

The prespecified value for the change in log-likelihood to evaluate convergence, as specified with the abstol argument.

mob.control

A list containing control parameters passed to lmtree(), as specified with ....

lmer.control

A list containing control parameters passed to lmer(), as specified in the lmer.control argument.

joint

Whether the fixed effects from the tree were (re-)estimated jointly along with the random effects, specified with the joint argument.

References

Fokkema M, Smits N, Zeileis A, Hothorn T, Kelderman H (2018). “Detecting Treatment-Subgroup Interactions in Clustered Data with Generalized Linear Mixed-Effects Model Trees”. Behavior Research Methods, 50(5), 2016-2034. doi:10.3758/s13428-017-0971-x

Fokkema M, Edbrooke-Childs J & Wolpert M (2021). “Generalized linear mixed-model (GLMM) trees: A flexible decision-tree method for multilevel and longitudinal data.” Psychotherapy Research, 31(3), 329-341. doi:10.1080/10503307.2020.1785037

Fokkema M & Zeileis A (2024). Subgroup detection in linear growth curve models with generalized linear mixed model (GLMM) trees. Behavior Research Methods. doi:10.3758/s13428-024-02389-1

See Also

plot.lmertree, plot.glmertree, cv.lmertree, cv.glmertree, GrowthCurveDemo, lmer, glmer, lmtree, glmtree

Examples


## artificial example data
data("DepressionDemo", package = "glmertree")

## fit normal linear regression LMM tree for continuous outcome
lt <- lmertree(depression ~ treatment | cluster | age + anxiety + duration,
  data = DepressionDemo)
print(lt)
plot(lt, which = "all") # default behavior, may also be "tree" or "ranef" 
coef(lt)
ranef(lt)
predict(lt, type = "response") # default behavior, may also be "node"
predict(lt, re.form = NA) # excludes random effects, see ?lme4::predict.merMod
residuals(lt)
VarCorr(lt) # see lme4::VarCorr


## fit logistic regression GLMM tree for binary outcome
gt <- glmertree(depression_bin ~ treatment | cluster | age + anxiety + duration,
  data = DepressionDemo)
print(gt)
plot(gt, which = "all") # default behavior, may also be "tree" or "ranef" 
coef(gt)
ranef(gt)
predict(gt, type = "response") # default behavior, may also be "node" or "link"
predict(gt, re.form = NA) # excludes random effects, see ?lme4::predict.merMod
residuals(gt)
VarCorr(gt) # see lme4::VarCorr

## Alternative specification for binomial family: no. of successes and failures
DepressionDemo$failures <- as.numeric(DepressionDemo$depression_bin) - 1
DepressionDemo$successes <- 1 - DepressionDemo$failures
gt <- glmertree(cbind(failures, successes) ~ treatment | cluster | age + anxiety + duration,
  data = DepressionDemo, ytype = "matrix") ## see also ?partykit::mob_control


[Package glmertree version 0.2-5 Index]