predict.blockForest {blockForest}  R Documentation 
Prediction using Random Forest variants for blockstructured covariate data
Description
This function is to be applied to the entry 'forest' of the output of
blockfor
. See the example section for illustration.
Usage
## S3 method for class 'blockForest'
predict(
object,
data = NULL,
predict.all = FALSE,
num.trees = object$num.trees,
type = "response",
se.method = "infjack",
quantiles = c(0.1, 0.5, 0.9),
seed = NULL,
num.threads = NULL,
verbose = TRUE,
...
)
Arguments
object 

data 
New test data of class 
predict.all 
Return individual predictions for each tree instead of aggregated predictions for all trees. Return a matrix (sample x tree) for classification and regression, a 3d array for probability estimation (sample x class x tree) and survival (sample x time x tree). 
num.trees 
Number of trees used for prediction. The first 
type 
Type of prediction. One of 'response', 'se', 'terminalNodes', 'quantiles' with default 'response'. See below for details. 
se.method 
Method to compute standard errors. One of 'jack', 'infjack' with default 'infjack'. Only applicable if type = 'se'. See below for details. 
quantiles 
Vector of quantiles for quantile prediction. Set 
seed 
Random seed. Default is 
num.threads 
Number of threads. Default is number of CPUs available. 
verbose 
Verbose output on or off. 
... 
further arguments passed to or from other methods. 
Details
For type = 'response'
(the default), the predicted classes (classification), predicted numeric values (regression), predicted probabilities (probability estimation) or survival probabilities (survival) are returned.
For type = 'se'
, the standard error of the predictions are returned (regression only). The jackknifeafterbootstrap or infinitesimal jackknife for bagging is used to estimate the standard errors based on outofbag predictions. See Wager et al. (2014) for details.
For type = 'terminalNodes'
, the IDs of the terminal node in each tree for each observation in the given dataset are returned.
For type = 'quantiles'
, the selected quantiles for each observation are estimated. See Meinshausen (2006) for details.
If type = 'se'
is selected, the method to estimate the variances can be chosen with se.method
. Set se.method = 'jack'
for jackknifeafterbootstrap and se.method = 'infjack'
for the infinitesimal jackknife for bagging.
For classification and predict.all = TRUE
, a factor levels are returned as numerics.
To retrieve the corresponding factor levels, use rf$forest$levels
, if rf
is the ranger object.
Value
Object of class blockForest.prediction
with elements
predictions  Predicted classes/values (only for classification and regression) 
unique.death.times  Unique death times (only for survival). 
chf  Estimated cumulative hazard function for each sample (only for survival). 
survival  Estimated survival function for each sample (only for survival). 
num.trees  Number of trees. 
num.independent.variables  Number of independent variables. 
treetype  Type of forest/tree. Classification, regression or survival. 
num.samples  Number of samples. 
Author(s)
Marvin N. Wright
References
Wright, M. N. & Ziegler, A. (2017). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. J Stat Softw 77:117. doi:10.18637/jss.v077.i01.
Wager, S., Hastie T., & Efron, B. (2014). Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife. J Mach Learn Res 15:16251651. https://jmlr.org/papers/v15/wager14a.html.
Meinshausen (2006). Quantile Regression Forests. J Mach Learn Res 7:983999. https://www.jmlr.org/papers/v7/meinshausen06a.html.
See Also
Examples
# NOTE: There is no association between covariates and response for the
# simulated data below.
# Moreover, the input parameters of blockfor() are highly unrealistic
# (e.g., nsets = 10 is specified much too small).
# The purpose of the shown examples is merely to illustrate the
# application of predict.blockForest().
# Generate data:
################
set.seed(1234)
# Covariate matrix:
X < cbind(matrix(nrow=40, ncol=5, data=rnorm(40*5)),
matrix(nrow=40, ncol=30, data=rnorm(40*30, mean=1, sd=2)),
matrix(nrow=40, ncol=100, data=rnorm(40*100, mean=2, sd=3)))
colnames(X) < paste("X", 1:ncol(X), sep="")
# Block variable (list):
block < rep(1:3, times=c(5, 30, 100))
block < lapply(1:3, function(x) which(block==x))
# Binary outcome:
ybin < factor(sample(c(0,1), size=40, replace=TRUE), levels=c(0,1))
# Survival outcome:
ysurv < cbind(rnorm(40), sample(c(0,1), size=40, replace=TRUE))
# Divide in training and test data:
Xtrain < X[1:30,]
Xtest < X[31:40,]
ybintrain < ybin[1:30]
ybintest < ybin[31:40]
ysurvtrain < ysurv[1:30,]
ysurvtest < ysurv[31:40,]
# Binary outcome: Apply algorithm to training data and obtain predictions
# for the test data:
#########################################################################
# Apply a variant to the training data:
blockforobj < blockfor(Xtrain, ybintrain, num.trees = 100, replace = TRUE, block=block,
nsets = 10, num.trees.pre = 50, splitrule="extratrees",
block.method = "SplitWeights")
blockforobj$paramvalues
# Obtain prediction for the test data:
(predres < predict(blockforobj$forest, data = Xtest, block.method = "SplitWeights"))
predres$predictions
# Survival outcome: Apply algorithm to training data and obtain predictions
# for the test data:
###########################################################################
# Apply a variant to the training data:
blockforobj < blockfor(Xtrain, ysurvtrain, num.trees = 100, replace = TRUE, block=block,
nsets = 10, num.trees.pre = 50, splitrule="extratrees",
block.method = "SplitWeights")
blockforobj$paramvalues
# Obtain prediction for the test data:
(predres < predict(blockforobj$forest, data = Xtest, block.method = "SplitWeights"))
rowSums(predres$chf)