rfcca {RFCCA} | R Documentation |
Random Forest with Canonical Correlation Analysis
Description
Estimates the canonical correlations between two sets of variables depending on the subject-related covariates.
Usage
rfcca(
X,
Y,
Z,
ntree = 200,
mtry = NULL,
nodesize = NULL,
nodedepth = NULL,
nsplit = 10,
importance = FALSE,
finalcca = c("cca", "scca", "rcca"),
bootstrap = TRUE,
samptype = c("swor", "swr"),
sampsize = if (samptype == "swor") function(x) {
x * 0.632
} else function(x)
{
x
},
forest = TRUE,
membership = FALSE,
bop = TRUE,
Xcenter = TRUE,
Ycenter = TRUE,
...
)
Arguments
X |
The first multivariate data set which has |
Y |
The second multivariate data set which has |
Z |
The set of subject-related covariates which has |
ntree |
Number of trees. |
mtry |
Number of z-variables randomly selected as candidates for
splitting a node. The default is |
nodesize |
Forest average number of unique data points in a terminal
node. The default is the |
nodedepth |
Maximum depth to which a tree should be grown. In the default, this parameter is ignored. |
nsplit |
Non-negative integer value for the number of random splits to
consider for each candidate splitting variable. When zero or |
importance |
Should variable importance of z-variables be assessed? The
default is |
finalcca |
Which CCA should be used for final canonical correlation
estimation? Choices are |
bootstrap |
Should the data be bootstrapped? The default value is
|
samptype |
Type of bootstrap. Choices are |
sampsize |
Size of sample to draw. For sampling without replacement, by default it is .632 times the sample size. For sampling with replacement, it is the sample size. |
forest |
Should the forest object be returned? It is used for prediction
on new data. The default is |
membership |
Should terminal node membership and inbag information be returned? |
bop |
Should the Bag of Observations for Prediction (BOP) for training
observations be returned? The default is |
Xcenter |
Should the columns of X be centered? The default is
|
Ycenter |
Should the columns of Y be centered? The default is
|
... |
Optional arguments to be passed to other methods. |
Value
An object of class (rfcca,grow)
which is a list with the
following components:
call |
The original call to |
n |
Sample size of the data ( |
ntree |
Number of trees grown. |
mtry |
Number of variables randomly selected for splitting at each node. |
nodesize |
Minimum forest average number of unique data points in a terminal node. |
nodedepth |
Maximum depth to which a tree is allowed to be grown. |
nsplit |
Number of randomly selected split points. |
xvar |
Data frame of x-variables. |
xvar.names |
A character vector of the x-variable names. |
yvar |
Data frame of y-variables. |
yvar.names |
A character vector of the y-variable names. |
zvar |
Data frame of z-variables. |
zvar.names |
A character vector of the z-variable names. |
leaf.count |
Number of terminal nodes for each tree in the forest.
Vector of length |
bootstrap |
Was the data bootstrapped? |
forest |
If |
membership |
A matrix recording terminal node membership where each cell represents the node number that an observations falls in for that tree. |
importance |
Variable importance measures (VIMP) for each z-variable. |
inbag |
A matrix recording inbag membership where each cell represents whether the observation is in the bootstrap sample in the corresponding tree. |
predicted.oob |
OOB predicted canonical correlations for training observations based on the selected final canonical correlation estimation method. |
predicted.coef |
Predicted canonical weight vectors for x- and y- variables. |
bop |
If |
finalcca |
The selected CCA used for final canonical correlation estimations. |
rfsrc.grow |
An object of class |
Details
- Final canonical correlation estimation:
Final canonical correlation can be computed with CCA (Hotelling, 1936), Sparse CCA (Witten et al., 2009) or Regularized CCA (Vinod,1976; Leurgans et al., 1993). If Regularized CCA will be used,
\lambda_1
and\lambda_2
should be specified.
References
Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28(3/4), 321–377.
Leurgans, S. E., Moyeed, R. A., & Silverman, B. W. (1993). Canonical correlation analysis when the data are curves. Journal of the Royal Statistical Society: Series B (Methodological), 55(3), 725-740.
Vinod, H.D. (1976). Canonical ridge and econometrics of joint production. Journal of econometrics, 4(2), 147–166.
Witten, D. M., Tibshirani, R., & Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3), 515-534.
See Also
predict.rfcca
global.significance
vimp.rfcca
print.rfcca
Examples
## load generated example data
data(data, package = "RFCCA")
set.seed(2345)
## define train/test split
smp <- sample(1:nrow(data$X), size = round(nrow(data$X) * 0.7),
replace = FALSE)
train.data <- lapply(data, function(x) {x[smp, ]})
test.Z <- data$Z[-smp, ]
## train rfcca
rfcca.obj <- rfcca(X = train.data$X, Y = train.data$Y, Z = train.data$Z,
ntree = 100, importance = TRUE)
## print the grow object
print(rfcca.obj)
## get the OOB predictions
pred.oob <- rfcca.obj$predicted.oob
## predict with new test data
pred.obj <- predict(rfcca.obj, newdata = test.Z)
pred <- pred.obj$predicted
## get the variable importance measures
z.vimp <- rfcca.obj$importance
## train rfcca and estimate the final canonical correlations with "scca"
rfcca.obj2 <- rfcca(X = train.data$X, Y = train.data$Y, Z = train.data$Z,
ntree = 100, finalcca = "scca")