covregrf {CovRegRF}  R Documentation 
Covariance Regression with Random Forests
Description
Estimates the covariance matrix of a multivariate response given a set of covariates using a random forest framework.
Usage
covregrf(
formula,
data,
params.rfsrc = list(ntree = 1000, mtry = ceiling(px/3), nsplit = max(round(n/50),
10)),
nodesize.set = round(0.5^(1:100) * sampsize)[round(0.5^(1:100) * sampsize) > py],
importance = FALSE
)
Arguments
formula 
Object of class 
data 
The multivariate data set which has 
params.rfsrc 
List of parameters that should be passed to

nodesize.set 
The set of 
importance 
Should variable importance of covariates be assessed? The
default is 
Value
An object of class (covregrf, grow)
which is a list with the
following components:
predicted.oob 
OOB predicted covariance matrices for training observations. 
importance 
Variable importance measures (VIMP) for covariates. 
best.nodesize 
Best 
params.rfsrc 
List of parameters that was used to fit random forest
with 
n 
Sample size of the data ( 
xvar.names 
A character vector of the covariate names. 
yvar.names 
A character vector of the response variable names. 
xvar 
Data frame of covariates. 
yvar 
Data frame of responses. 
rf.grow 
Fitted random forest object. This object is used for prediction with training or new data. 
Details
For mean regression problems, random forests search for the optimal level
of the nodesize
parameter by using outofbag (OOB) prediction
errors computed as the difference between the true responses and OOB
predictions. The nodesize
value having the smallest OOB prediction
error is chosen. However, the covariance regression problem is
unsupervised by nature. Therefore, we tune nodesize
parameter with a
heuristic method. We use OOB covariance matrix estimates. The general idea
of the proposed tuning method is to find the nodesize
level where
the OOB covariance matrix predictions converge. The steps are as follows.
Firstly, we train separate random forests for a set of nodesize
values. Secondly, we compute the OOB covariance matrix estimates for each
random forest. Next, we compute the mean absolute difference (MAD) between
the upper triangular OOB covariance matrix estimates of two consecutive
nodesize
levels over all observations. Finally, we take the pair of
nodesize
levels having the smallest MAD. Among these two
nodesize
levels, we select the smaller since in general deeper trees
are desired in random forests.
See Also
predict.covregrf
significance.test
vimp.covregrf
print.covregrf
Examples
options(rf.cores=2, mc.cores=2)
## load generated example data
data(data, package = "CovRegRF")
xvar.names < colnames(data$X)
yvar.names < colnames(data$Y)
data1 < data.frame(data$X, data$Y)
## define train/test split
set.seed(2345)
smp < sample(1:nrow(data1), size = round(nrow(data1)*0.6), replace = FALSE)
traindata < data1[smp,,drop=FALSE]
testdata < data1[smp, xvar.names, drop=FALSE]
## formula object
formula < as.formula(paste(paste(yvar.names, collapse="+"), ".", sep=" ~ "))
## train covregrf
covregrf.obj < covregrf(formula, traindata, params.rfsrc = list(ntree = 50),
importance = TRUE)
## get the OOB predictions
pred.oob < covregrf.obj$predicted.oob
## predict with new test data
pred.obj < predict(covregrf.obj, newdata = testdata)
pred < pred.obj$predicted
## get the variable importance measures
vimp < covregrf.obj$importance