icrf {icrf} | R Documentation |
Interval Censored Recursive Forests (ICRF)
Description
icrf
implements the ICRF algorithm to estimate the
conditional survival probability for interval censored survival data.
(It can also be used for right-censored survival data and current status data.)
icrf
recursively builds random forests using the extremely randomized
trees (ERT) algorithm and uses kernel smoothing in the time domain.
This icrf
package is built based on the randomForest
package
by Andy Liaw and Matthew Wiener. (Quoted statements are from
randomForest
by Liaw and Wiener unless otherwise mentioned.)
Usage
icrf(x, ...)
## Default S3 method:
icrf(
x,
L,
R,
tau = max(R[is.finite(R)]) * 1.5,
bandwidth = NULL,
quasihonesty = TRUE,
initialSmoothing = TRUE,
timeSmooth = NULL,
xtest = NULL,
ytest = NULL,
nfold = 5L,
ntree = 500L,
mtry = ceiling(sqrt(p)),
split.rule = c("Wilcoxon", "logrank", "PetoWilcoxon", "PetoLogrank", "GWRS", "GLR",
"SWRS", "SLR"),
ERT = FALSE,
uniformERT = ERT,
returnBest = sampsize < n,
imse.monitor = 1,
replace = !ERT,
sampsize = ifelse(ERT, 0.95, 0.632) * n,
nodesize = 6L,
maxnodes = NULL,
importance = FALSE,
nPerm = 1,
proximity,
oob.prox = ifelse(sampsize == n & !replace, FALSE, proximity),
do.trace = FALSE,
keep.forest = is.null(xtest),
keep.inbag = FALSE,
...
)
## S3 method for class 'formula'
icrf(
formula,
data = NULL,
data.type = c("interval", "right", "currentstatus"),
interval.label = c("L", "R"),
right.label = c("T", "status"),
currentstatus.label = c("monitor", "status"),
...,
na.action = na.fail,
epsilon = NULL
)
## S3 method for class 'icrf'
print(x, ...)
Arguments
x |
a data frame or a matrix of predictors. |
... |
optional arguments to be passed to icrf.default. |
L , R |
the left and right end point of the interval. |
tau |
the study end time. ([0, |
bandwidth |
a positive number. The bandwidth of the kernel smoothing. For faster computing,
set |
quasihonesty |
if |
initialSmoothing |
if |
timeSmooth |
a numeric vector of time points at which the smoothed
survival curves are estimated. It should be in an increasing order.
If |
xtest |
a dataset or matrix of predictors for the test dataset. |
ytest |
a true survival curve for the test set in a form of the dataframe or matrix.
The number of rows is the same as |
nfold |
Number of forests to iterate. In practice, numbers between 5 and 10 is reasonable. |
ntree |
Number of trees to build within each forest. 'This should not be set to too small a number, to ensure that every input row gets predicted at least a few times.' |
mtry |
Number of candidate predictors tried at each split.
The default value is sqrt(p) where p is number of variables in |
split.rule |
Splitting rules. See details. The default is
|
ERT |
If |
uniformERT |
Only relevant when |
returnBest |
If |
imse.monitor |
Which type of IMSE is used to monitor which fold is the best? |
replace |
Whether the cases are sampled with or without replacement? |
sampsize |
Size of random sampling. |
nodesize |
Each terminal node cannot be smaller than this value. 'Setting this number larger causes smaller trees to be grown (and thus take less time).' |
maxnodes |
Up to how many terminal nodes can a tree have? 'If not given, trees are grown to the maximum possible (subject to limits by nodesize). If set larger than maximum possible, a warning is issued.' |
importance |
If |
nPerm |
How many permutations (of OOB data) to do for variable importance assessment? 'Number larger than 1 gives slightly more stable estimate, but not very effective. Currently only implemented for regression.' |
proximity |
If |
oob.prox |
If |
do.trace |
If |
keep.forest |
'If set to FALSE, the forest will not be retained in the output object. If xtest is given, defaults to FALSE.' |
keep.inbag |
'Should an n by ntree matrix be returned that keeps track of which samples are "in-bag" in which trees (but not how many times, if sampling with replacement)' |
formula , data.type , interval.label , right.label , currentstatus.label |
a formula object, with the
response in a Surv 'interval2' or |
data |
a data frame that includes the intervals and the predictor values. |
na.action |
'a function to specify the action to be taken if NAs are found. (NOTE: If given, this argument must be named.)' |
epsilon |
A small positive value needed to discriminate the left and right interval end points for the uncensored data. |
Details
Four split.rule
options are available: Wilcoxon
, logrank
,
PetoWilcoxon
, PetoLogrank
. The aliases are
GWRS
, GLR
, SWRS
, and SLR
, respectively.
The first two are generalized
Wilcoxon-rank-sum test and generalized log-rank test proposed in Cho et al (2020+),
and the latter two are score-based Wilcoxon-rank-sum test and score-based
log-rank test proposed by Peto and Peto (1972) "Asymptotically efficient
rank invariant test procedures."
Value
An icrf
class object which contains the following components in a list:
An icrf
class object which contains the following components in a list:
callthe original call to
icrf
methodThe input values of
split.rule
,ERT
,quasihonest,
bandwith
, and the subsample ratio (=sampsize
/n
)predictedthe estimated survival curves of the training set using out-of-bag samples.
predictedNOthe estimated survival curves of the training set using non-out-of-bag samples.
predictedNO.Smthe smoothed survival curves of the training set using non-out-of-bag samples.
time.pointstime points at which the survival curves are estimated.
time.points.smoothtime points at which the smoothed survival curves are estimated.
imse.oobIntegrated mean squared error (IMSE) measured based on the out-of-bag samples
imse.NOIntegrated mean squared error (IMSE) measured based on the non-out-of-bag samples
oob.timesnumber of times for which each case was 'out-of-bag'
importancean array of three matrices where each matrix has
nfold
columns andp
(number of predictors) rows. The importance is measured based on increase in IMSE types 1 and 2, respectively, and the node impurity.importanceSD'The "standard errors" of the permutation-based importance measure.' A
p
bynfold
by 2 array corresponding to the first two matrices of the importance array.nfoldnumber of forests iterated over.
ntreenumber of trees built.
mtrynumber of candidate predictors tried at each node.
forest'a list that contains the entire forest;'
NULL
'ifkeep.forest=FALSE
.'intervals
n
by 2 matrix of the intervals.proximityif
proximity=TRUE
ifproximity=TRUE
whenicrf
is called, a matrix of proximity measures among the input (based on the frequency that pairs of data points are in the same terminal nodes).inbagif
keep.inbag=TRUE
provides a matrix of in-bag indicators for the last forest iteration.runtimestart and end times and the elapsed time.
testif test set is given (through the
xtest
or additionallyytest
arguments), this component is a list which contains the correspondingpredicted
and error measures (IMSE's). Ifproximity=TRUE
, there is also a component,proximity
, which contains the proximity among the test set as well as proximity between test and training data.
Author(s)
Hunyong Cho, Nicholas P. Jewell, and Michael R. Kosorok.
References
Cho H., Jewell N. J., and Kosorok M. R. (2020+). "Interval censored recursive forests"
See Also
predict.icrf
, plot.icrf
, survplot
, importance.icrf
Examples
# rats data example.
# The type of this dataset is current status data.
# Note that this is a toy example. Use a larger ntree and nfold in practice.
data(rat2)
set.seed(2)
# 1. formula (currentstatus)
rats.icrf <-
icrf(~ dose.lvl + weight + male + cage.no, data = rat2,
data.type = "currentstatus", currentstatus.label = c("survtime", "tumor"),
returnBest = TRUE, ntree=10, nfold=3)
# 2. formula containing the interval
# Alternatively, create the interval endpoints and use the Surv object.
L = ifelse(rat2$tumor, 0, rat2$survtime)
R = ifelse(rat2$tumor, rat2$survtime, Inf)
library(survival) # for Surv function
icrf(Surv(L, R, type = "interval2") ~ dose.lvl + weight + male + cage.no, data = rat2,
ntree=10, nfold=3)
# Or, 3. formula (interval)
rat2b <- cbind(rat2, L = L, R = R)
set.seed(1)
icrf( ~ dose.lvl + weight + male + cage.no, data = rat2b,
data.type = "interval", interval.label = c("L", "R"),
ntree=10, nfold=3)
# 4. default method
set.seed(1)
icrf(rat2[, c("dose.lvl", "weight", "male", "cage.no")], L = L, R = R,
ntree=10, nfold=3)