covregrf {CovRegRF}R Documentation

Covariance Regression with Random Forests

Description

Estimates the covariance matrix of a multivariate response given a set of covariates using a random forest framework.

Usage

covregrf(
  formula,
  data,
  params.rfsrc = list(ntree = 1000, mtry = ceiling(px/3), nsplit = max(round(n/50),
    10)),
  nodesize.set = round(0.5^(1:100) * sampsize)[round(0.5^(1:100) * sampsize) > py],
  importance = FALSE
)

Arguments

formula

Object of class formula or character describing the model to fit. Interaction terms are not supported.

data

The multivariate data set which has n observations and px+py variables where px and py are the number of covariates (X) and response variables (Y), respectively. Should be a data.frame.

params.rfsrc

List of parameters that should be passed to randomForestSRC. In the default parameter set, ntree = 1000, mtry = px/3 (rounded up), nsplit = max(round(n/50), 10). See randomForestSRC for possible parameters.

nodesize.set

The set of nodesize levels for tuning. Default set includes the power of two times the sub-sample size (.632n) greater than the number of response variables (py). See below for details of the nodesize tuning.

importance

Should variable importance of covariates be assessed? The default is FALSE.

Value

An object of class (covregrf, grow) which is a list with the following components:

predicted.oob

OOB predicted covariance matrices for training observations.

importance

Variable importance measures (VIMP) for covariates.

best.nodesize

Best nodesize value selected with the proposed tuning method.

params.rfsrc

List of parameters that was used to fit random forest with randomForestSRC.

n

Sample size of the data (NA's are omitted).

xvar.names

A character vector of the covariate names.

yvar.names

A character vector of the response variable names.

xvar

Data frame of covariates.

yvar

Data frame of responses.

rf.grow

Fitted random forest object. This object is used for prediction with training or new data.

Details

For mean regression problems, random forests search for the optimal level of the nodesize parameter by using out-of-bag (OOB) prediction errors computed as the difference between the true responses and OOB predictions. The nodesize value having the smallest OOB prediction error is chosen. However, the covariance regression problem is unsupervised by nature. Therefore, we tune nodesize parameter with a heuristic method. We use OOB covariance matrix estimates. The general idea of the proposed tuning method is to find the nodesize level where the OOB covariance matrix predictions converge. The steps are as follows. Firstly, we train separate random forests for a set of nodesize values. Secondly, we compute the OOB covariance matrix estimates for each random forest. Next, we compute the mean absolute difference (MAD) between the upper triangular OOB covariance matrix estimates of two consecutive nodesize levels over all observations. Finally, we take the pair of nodesize levels having the smallest MAD. Among these two nodesize levels, we select the smaller since in general deeper trees are desired in random forests.

See Also

predict.covregrf significance.test vimp.covregrf print.covregrf

Examples

options(rf.cores=2, mc.cores=2)

## load generated example data
data(data, package = "CovRegRF")
xvar.names <- colnames(data$X)
yvar.names <- colnames(data$Y)
data1 <- data.frame(data$X, data$Y)

## define train/test split
set.seed(2345)
smp <- sample(1:nrow(data1), size = round(nrow(data1)*0.6), replace = FALSE)
traindata <- data1[smp,,drop=FALSE]
testdata <- data1[-smp, xvar.names, drop=FALSE]

## formula object
formula <- as.formula(paste(paste(yvar.names, collapse="+"), ".", sep=" ~ "))

## train covregrf
covregrf.obj <- covregrf(formula, traindata, params.rfsrc = list(ntree = 50),
  importance = TRUE)

## get the OOB predictions
pred.oob <- covregrf.obj$predicted.oob

## predict with new test data
pred.obj <- predict(covregrf.obj, newdata = testdata)
pred <- pred.obj$predicted

## get the variable importance measures
vimp <- covregrf.obj$importance



[Package CovRegRF version 2.0.1 Index]