covregrf {CovRegRF} | R Documentation |
Covariance Regression with Random Forests
Description
Estimates the covariance matrix of a multivariate response given a set of covariates using a random forest framework.
Usage
covregrf(
formula,
data,
params.rfsrc = list(ntree = 1000, mtry = ceiling(px/3), nsplit = max(round(n/50),
10)),
nodesize.set = round(0.5^(1:100) * sampsize)[round(0.5^(1:100) * sampsize) > py],
importance = FALSE
)
Arguments
formula |
Object of class |
data |
The multivariate data set which has |
params.rfsrc |
List of parameters that should be passed to
|
nodesize.set |
The set of |
importance |
Should variable importance of covariates be assessed? The
default is |
Value
An object of class (covregrf, grow)
which is a list with the
following components:
predicted.oob |
OOB predicted covariance matrices for training observations. |
importance |
Variable importance measures (VIMP) for covariates. |
best.nodesize |
Best |
params.rfsrc |
List of parameters that was used to fit random forest
with |
n |
Sample size of the data ( |
xvar.names |
A character vector of the covariate names. |
yvar.names |
A character vector of the response variable names. |
xvar |
Data frame of covariates. |
yvar |
Data frame of responses. |
rf.grow |
Fitted random forest object. This object is used for prediction with training or new data. |
Details
For mean regression problems, random forests search for the optimal level
of the nodesize
parameter by using out-of-bag (OOB) prediction
errors computed as the difference between the true responses and OOB
predictions. The nodesize
value having the smallest OOB prediction
error is chosen. However, the covariance regression problem is
unsupervised by nature. Therefore, we tune nodesize
parameter with a
heuristic method. We use OOB covariance matrix estimates. The general idea
of the proposed tuning method is to find the nodesize
level where
the OOB covariance matrix predictions converge. The steps are as follows.
Firstly, we train separate random forests for a set of nodesize
values. Secondly, we compute the OOB covariance matrix estimates for each
random forest. Next, we compute the mean absolute difference (MAD) between
the upper triangular OOB covariance matrix estimates of two consecutive
nodesize
levels over all observations. Finally, we take the pair of
nodesize
levels having the smallest MAD. Among these two
nodesize
levels, we select the smaller since in general deeper trees
are desired in random forests.
See Also
predict.covregrf
significance.test
vimp.covregrf
print.covregrf
Examples
options(rf.cores=2, mc.cores=2)
## load generated example data
data(data, package = "CovRegRF")
xvar.names <- colnames(data$X)
yvar.names <- colnames(data$Y)
data1 <- data.frame(data$X, data$Y)
## define train/test split
set.seed(2345)
smp <- sample(1:nrow(data1), size = round(nrow(data1)*0.6), replace = FALSE)
traindata <- data1[smp,,drop=FALSE]
testdata <- data1[-smp, xvar.names, drop=FALSE]
## formula object
formula <- as.formula(paste(paste(yvar.names, collapse="+"), ".", sep=" ~ "))
## train covregrf
covregrf.obj <- covregrf(formula, traindata, params.rfsrc = list(ntree = 50),
importance = TRUE)
## get the OOB predictions
pred.oob <- covregrf.obj$predicted.oob
## predict with new test data
pred.obj <- predict(covregrf.obj, newdata = testdata)
pred <- pred.obj$predicted
## get the variable importance measures
vimp <- covregrf.obj$importance