R: Bootstrap Stacking model builder.

BSstack {Sstack}

R Documentation

Bootstrap Stacking model builder.

Description

Creates a bootstrapped linear stacked set of Random Forest (RF) models given a set of heterogeneous datasets.

Usage

BSstack(T = 50, mtry = NULL, nodesize = 5, iter = 25, CV = NA,
  Xn = NULL, ECHO = TRUE, Y, X1, X2, ...)

Arguments

`T`	Number of trees for the individual RF models. (int)
`mtry`	Number of variables available for splitting at each tree node. If a scalar is given then all models use the given values. If a 1D array is given then each individual model uses the given value. If NA then for each model it will be set to Nfeats/3
`nodesize`	Minimum size of terminal nodes. If a scalar is given then all models use the given values. If a 1D array is given then each individual model uses the given value. By default all models use 5.
`iter`	The number of time to bootstrap sample the data. (int)
`CV`	Cross validation (CV) to measure mean-absolute error and correlation coefficient, if NA (default) no CV is performed. Otherwise the value gives the number of folds for CV. If CV<2 then leave-one-out CV is performed. CV is performed utilizing the samples that have full record.
`Xn`	List containing each dataset to be stacked. If not supplied will be generated from X1, X2, ...
`ECHO`	Bool, enable to provide output to the user in terms of overlapping samples and runtime for CV.
`Y`	Nsample x 1 data table of responses for ALL samples. Must have matching rownames with each individual dataset.
`X1`	Data table of first dataset to be stacked. Rownames should be contained within Y.
`X2`	Data table of second dataset to be stacked. Rownames should be contained within Y.
`...`	Further data tables, X3, X4, ..., Xl.

Details

Required Packages: dplyr, randomForest, foreach

Value

If CV != null : A list composed of: [1] List containing [1] individual RF models, [2] Nstack +1 weights and [3] feature names for full record samples. This argument is what is used for BSstack_predict [2] Mean-absolute error calculated using cross validation (scalar). [3] Pearson correlation coefficient between actual and predicted values through cross validation (scalar -1<=r<=1). [4] Individual weights calculate for each fold (CV x Nstack+1 matrix). [5] Out of fold predictions for the overlaping samples. [6] Actual values for the overlaping samples. If CV > 1 : Also [7] The fold assignments for the overlapping samples. If CV = null : Only [1] is returned.

Examples

library(Sstack)
library(doParallel)
data(StackData)

AUC=StackData[[1]]
GE=StackData[[2]]
RPPA=StackData[[3]]

X1 <- GE[1:400,1:75]
X2 <- GE[200:400,76:150]
Xt <- GE[401:487,]

set.seed(1)

cl <- makeCluster(2)
registerDoParallel(cl)

Hbs <- BSstack(T = 25, iter = 20, Y = AUC, X1 = X1, X2 = X2)

stopCluster(cl)

Yp <- BSstack_predict(Hbs[[1]],Xt)

maeH1 <- mean(abs(AUC[401:487,]-Yp[,1]))
maeH2 <- mean(abs(AUC[401:487,]-Yp[,2]))
maeHs <- mean(abs(AUC[401:487,]-Yp[,3]))

[Package Sstack version 1.0.1 Index]