MetaForest {metaforest} | R Documentation |
Conduct a MetaForest analysis to explore heterogeneity in meta-analytic data.
Description
MetaForest uses a weighted random forest to explore heterogeneity in
meta-analytic data. MetaForest is a wrapper for ranger
(Wright & Ziegler, 2015). As input, MetaForest takes the study effect sizes
and their variances (these can be computed, for example, using the
metafor
package), as well as the moderators
that are to be included in the model. By default, MetaForest uses
random-effects weights, and estimates the between-studies variance using a
restricted maximum-likelihood estimator. However, it may be beneficial to
first conduct an unweighted MetaForest, and then use the estimated residual
heterogeneity from this model as the estimate of tau2
for a
random-effects weighted MetaForest.
Usage
MetaForest(
formula,
data,
vi = "vi",
study = NULL,
whichweights = "random",
num.trees = 500,
mtry = NULL,
method = "REML",
tau2 = NULL,
...
)
Arguments
formula |
Formula. Specify a formula for the MetaForest model, for
example, |
data |
A data.frame containing the effect size, moderators, and the variance of the effect size. |
vi |
Character. Specify the name of the column in the |
study |
Character. Optionally, specify the name of the column in the
|
whichweights |
Character. Indicate what time of weights are required.
A random-effects MetaForest is grown by specifying |
num.trees |
Atomic integer. Specify the number of trees in the forest. Defaults to 500. |
mtry |
Atomic integer. Number of candidate moderators available for each split. Defaults to the square root of the number moderators (rounded down). |
method |
Character. Specify the method by which to estimate the residual
variance. Can be set to one of the following: "DL", "HE", "SJ", "ML", "REML",
"EB", "HS", or "GENQ". Default is "REML". See the
|
tau2 |
Numeric. Specify a predetermined value for the residual heterogeneity. Entering a value here supersedes the estimated tau2 value. Defaults to NULL. |
... |
Additional arguments are passed directly to ranger. It is recommended not to use additional arguments. |
Details
For dependent data, a clustered MetaForest analysis is more
appropriate. This is because the predictive performance of a MetaForest
analysis is evaluated on out-of-bootstrap cases, and when cases out of the
bootstrap sample originate from the same study, the model will be overly
confident in its ability to predict their value. When the MetaForest is
clustered by the study
variable, the dataset is first split into two
cross-validation samples by study. All dependent effect sizes from each study
are thus included in the same cross-validation sample. Then, two random
forests are grown on these cross-validation samples, and for each random
forest, the other sample is used to calculate prediction error and variable
importance, see doi:10.1007/s11634-016-0276-4.
Value
List of length 3. The "forest" element of this list is an object of class "ranger", containing the results of the random forests analysis. The "rma_before" element is an object of class "rma.uni", containing the results of a random-effects meta-analysis on the raw data, without moderators. The "rma_after" element is an object of class "rma.uni", containing the results of a random-effects meta-analysis on the residual heterogeneity, or the difference between the effect sizes predicted by MetaForest and the observed effect sizes.
Examples
#Example 1:
#Simulate data with a univariate linear model
set.seed(42)
data <- SimulateSMD()
#Conduct unweighted MetaForest analysis
mf.unif <- MetaForest(formula = yi ~ ., data = data$training,
whichweights = "unif", method = "DL")
#Print model
mf.unif
#Conduct random-effects weighted MetaForest analysis
mf.random <- MetaForest(formula = yi ~ ., data = data$training,
whichweights = "random", method = "DL",
tau2 = 0.0116)
#Print summary
summary(mf.random)
#Example 2: Real data from metafor
#Load and clean data
data <- dat.bangertdrowns2004
data[, c(4:12)] <- apply(data[ , c(4:12)], 2, function(x){
x[is.na(x)] <- median(x, na.rm = TRUE)
x})
data$subject <- factor(data$subject)
data$yi <- as.numeric(data$yi)
#Conduct MetaForest analysis
mf.bd2004 <- MetaForest(formula = yi~ grade + length + minutes + wic+
meta, data, whichweights = "unif")
#Print MetaForest object
mf.bd2004
#Check convergence plot
plot(mf.bd2004)
#Check summary
summary(mf.bd2004, digits = 4)
#Examine variable importance plot
VarImpPlot(mf.bd2004)