BSstack {Sstack} | R Documentation |
Bootstrap Stacking model builder.
Description
Creates a bootstrapped linear stacked set of Random Forest (RF) models given a set of heterogeneous datasets.
Usage
BSstack(T = 50, mtry = NULL, nodesize = 5, iter = 25, CV = NA,
Xn = NULL, ECHO = TRUE, Y, X1, X2, ...)
Arguments
T |
Number of trees for the individual RF models. (int) |
mtry |
Number of variables available for splitting at each tree node. If a scalar is given then all models use the given values. If a 1D array is given then each individual model uses the given value. If NA then for each model it will be set to Nfeats/3 |
nodesize |
Minimum size of terminal nodes. If a scalar is given then all models use the given values. If a 1D array is given then each individual model uses the given value. By default all models use 5. |
iter |
The number of time to bootstrap sample the data. (int) |
CV |
Cross validation (CV) to measure mean-absolute error and correlation coefficient, if NA (default) no CV is performed. Otherwise the value gives the number of folds for CV. If CV<2 then leave-one-out CV is performed. CV is performed utilizing the samples that have full record. |
Xn |
List containing each dataset to be stacked. If not supplied will be generated from X1, X2, ... |
ECHO |
Bool, enable to provide output to the user in terms of overlapping samples and runtime for CV. |
Y |
Nsample x 1 data table of responses for ALL samples. Must have matching rownames with each individual dataset. |
X1 |
Data table of first dataset to be stacked. Rownames should be contained within Y. |
X2 |
Data table of second dataset to be stacked. Rownames should be contained within Y. |
... |
Further data tables, X3, X4, ..., Xl. |
Details
Required Packages: dplyr, randomForest, foreach
Value
If CV != null : A list composed of: [1] List containing [1] individual RF models, [2] Nstack +1 weights and [3] feature names for full record samples. This argument is what is used for BSstack_predict [2] Mean-absolute error calculated using cross validation (scalar). [3] Pearson correlation coefficient between actual and predicted values through cross validation (scalar -1<=r<=1). [4] Individual weights calculate for each fold (CV x Nstack+1 matrix). [5] Out of fold predictions for the overlaping samples. [6] Actual values for the overlaping samples. If CV > 1 : Also [7] The fold assignments for the overlapping samples. If CV = null : Only [1] is returned.
Examples
library(Sstack)
library(doParallel)
data(StackData)
AUC=StackData[[1]]
GE=StackData[[2]]
RPPA=StackData[[3]]
X1 <- GE[1:400,1:75]
X2 <- GE[200:400,76:150]
Xt <- GE[401:487,]
set.seed(1)
cl <- makeCluster(2)
registerDoParallel(cl)
Hbs <- BSstack(T = 25, iter = 20, Y = AUC, X1 = X1, X2 = X2)
stopCluster(cl)
Yp <- BSstack_predict(Hbs[[1]],Xt)
maeH1 <- mean(abs(AUC[401:487,]-Yp[,1]))
maeH2 <- mean(abs(AUC[401:487,]-Yp[,2]))
maeHs <- mean(abs(AUC[401:487,]-Yp[,3]))