R: Conduct a MetaForest analysis to explore heterogeneity in...

MetaForest {metaforest}

R Documentation

Conduct a MetaForest analysis to explore heterogeneity in meta-analytic data.

Description

MetaForest uses a weighted random forest to explore heterogeneity in meta-analytic data. MetaForest is a wrapper for ranger (Wright & Ziegler, 2015). As input, MetaForest takes the study effect sizes and their variances (these can be computed, for example, using the metafor package), as well as the moderators that are to be included in the model. By default, MetaForest uses random-effects weights, and estimates the between-studies variance using a restricted maximum-likelihood estimator. However, it may be beneficial to first conduct an unweighted MetaForest, and then use the estimated residual heterogeneity from this model as the estimate of tau2 for a random-effects weighted MetaForest.

Usage

MetaForest(
  formula,
  data,
  vi = "vi",
  study = NULL,
  whichweights = "random",
  num.trees = 500,
  mtry = NULL,
  method = "REML",
  tau2 = NULL,
  ...
)

Arguments

`formula`	Formula. Specify a formula for the MetaForest model, for example, `yi ~ .` to predict the outcome `yi` from all moderators in the data. Only additive formulas are allowed (i.e., `x1+x2+x3`). Interaction terms and non-linear terms are not required, as the random forests algorithm inherently captures these associations.
`data`	A data.frame containing the effect size, moderators, and the variance of the effect size.
`vi`	Character. Specify the name of the column in the `data` that contains the variances of the effect sizes. This column will be removed from the data prior to analysis. Defaults to `"vi"`.
`study`	Character. Optionally, specify the name of the column in the `data` that contains the study id. Use this when the data includes multiple effect sizes per study. This column can be a vector of integers, or a factor. This column will be removed from the data prior to analysis. See `Details` for more information about analyzing dependent data.
`whichweights`	Character. Indicate what time of weights are required. A random-effects MetaForest is grown by specifying `whichweights = "random"`. A fixed-effects MetaForest is grown by specifying `whichweights = "fixed"`. An unweighted MetaForest is grown by specifying `whichweights = "unif"`. Defaults to `"random"`.
`num.trees`	Atomic integer. Specify the number of trees in the forest. Defaults to 500.
`mtry`	Atomic integer. Number of candidate moderators available for each split. Defaults to the square root of the number moderators (rounded down).
`method`	Character. Specify the method by which to estimate the residual variance. Can be set to one of the following: "DL", "HE", "SJ", "ML", "REML", "EB", "HS", or "GENQ". Default is "REML". See the `metafor` package for more information about these estimators.
`tau2`	Numeric. Specify a predetermined value for the residual heterogeneity. Entering a value here supersedes the estimated tau2 value. Defaults to NULL.
`...`	Additional arguments are passed directly to ranger. It is recommended not to use additional arguments.

Details

For dependent data, a clustered MetaForest analysis is more appropriate. This is because the predictive performance of a MetaForest analysis is evaluated on out-of-bootstrap cases, and when cases out of the bootstrap sample originate from the same study, the model will be overly confident in its ability to predict their value. When the MetaForest is clustered by the study variable, the dataset is first split into two cross-validation samples by study. All dependent effect sizes from each study are thus included in the same cross-validation sample. Then, two random forests are grown on these cross-validation samples, and for each random forest, the other sample is used to calculate prediction error and variable importance, see doi:10.1007/s11634-016-0276-4.

Value

List of length 3. The "forest" element of this list is an object of class "ranger", containing the results of the random forests analysis. The "rma_before" element is an object of class "rma.uni", containing the results of a random-effects meta-analysis on the raw data, without moderators. The "rma_after" element is an object of class "rma.uni", containing the results of a random-effects meta-analysis on the residual heterogeneity, or the difference between the effect sizes predicted by MetaForest and the observed effect sizes.

Examples

#Example 1:
#Simulate data with a univariate linear model
set.seed(42)
data <- SimulateSMD()
#Conduct unweighted MetaForest analysis
mf.unif <- MetaForest(formula = yi ~ ., data = data$training,
                      whichweights = "unif", method = "DL")
#Print model
mf.unif
#Conduct random-effects weighted MetaForest analysis
mf.random <- MetaForest(formula = yi ~ ., data = data$training,
                        whichweights = "random", method = "DL",
                        tau2 = 0.0116)
#Print summary
summary(mf.random)

#Example 2: Real data from metafor
#Load and clean data
data <- dat.bangertdrowns2004
data[, c(4:12)] <- apply(data[ , c(4:12)], 2, function(x){
  x[is.na(x)] <- median(x, na.rm = TRUE)
  x})
data$subject <- factor(data$subject)
data$yi <- as.numeric(data$yi)
#Conduct MetaForest analysis
mf.bd2004 <- MetaForest(formula = yi~ grade + length + minutes + wic+
                               meta, data, whichweights = "unif")
#Print MetaForest object
mf.bd2004
#Check convergence plot
plot(mf.bd2004)
#Check summary
summary(mf.bd2004, digits = 4)
#Examine variable importance plot
VarImpPlot(mf.bd2004)

[Package metaforest version 0.1.4 Index]