tune.ltrcrrf {LTRCforests}R Documentation

Tune mtry to the optimal value with respect to out-of-bag error for a LTRCRRF model

Description

Starting with the default value of mtry, search for the optimal value (with respect to out-of-bag error estimate) of mtry for ltrcrrf.

Usage

tune.ltrcrrf(
  formula,
  data,
  id,
  mtryStart = NULL,
  stepFactor = 2,
  time.eval = NULL,
  time.tau = NULL,
  ntreeTry = 100L,
  bootstrap = c("by.sub", "by.root", "by.node", "by.user", "none"),
  samptype = c("swor", "swr"),
  sampfrac = 0.632,
  samp = NULL,
  na.action = "na.omit",
  trace = TRUE,
  doBest = FALSE,
  plot = FALSE,
  ntime,
  nsplit = 10L,
  nodesizeTry = max(ceiling(sqrt(nrow(data))), 15),
  nodedepth = NULL
)

Arguments

formula

a formula object, with the response being a Surv object, with form

Surv(tleft, tright, event).

data

a a data frame containing n rows of left-truncated right-censored observations.

id

variable name of subject identifiers. If this is present, it will be searched for in the data data frame. Each group of rows in data with the same subject id represents the covariate path through time of a single subject. If not specified, the algorithm then assumes data contains left-truncated and right-censored survival data with time-invariant covariates.

mtryStart

starting value of mtry; default is sqrt(nvar).

stepFactor

at each iteration, mtry is inflated (or deflated) by this value, used when mtry is not specified (see ltrcrrf). The default value is 2.

time.eval

a vector of time points, at which the estimated survival probabilities are evaluated.

time.tau

an optional vector, with the i-th entry giving the upper time limit for the computed survival probabilities for the i-th data (i.e., only computes survival probabilies at time.eval[time.eval <= time.tau[i]] for the i-th data of interest).

ntreeTry

number of trees used at the tuning step.

bootstrap

bootstrap protocol. (1) If id is present, the choices are: "by.sub" (by default) which bootstraps subjects, "by.root" which bootstraps pseudo-subjects. Both can be with or without replacement (by default sampling is without replacement; see the option samptype below). (2) If id is not specified, the default is "by.root" which bootstraps the data by sampling with or without replacement; if "by.node" is choosen, data is bootstrapped with replacement at each node while growing the tree. Regardless of the presence of id, if "none" is chosen, the data is not bootstrapped at all. If "by.user" is choosen, the bootstrap specified by samp is used.

samptype

choices are swor (sampling without replacement) and swr (sampling with replacement). The default action here is sampling without replacement.

sampfrac

a fraction, determining the proportion of subjects to draw without replacement when samptype = "swor". The default value is 0.632. To be more specific, if id is present, 0.632 * N of subjects with their pseudo-subject observations are drawn without replacement (N denotes the number of subjects); otherwise, 0.632 * n is the requested size of the sample.

samp

Bootstrap specification when bootstype = "by.user". Array of dim n x ntree specifying how many times each record appears in each bootstrap sample.

na.action

action taken if the data contains NA’s. The default "na.omit" removes the entire record if any of its entries is NA (for x-variables this applies only to those specifically listed in formula). See function rfsrc for other available options.

trace

whether to print the progress of the search. trace = TRUE is set by default.

doBest

whether to run a ltrcrrf object using the optimal mtry found. doBest = FALSE is set by default.

plot

whether to plot the out-of-bag error as a function of mtry. plot = FALSE is set by default.

ntime

an integer value used for survival to constrain ensemble calculations to a grid of ntime time points. Alternatively if a vector of values of length greater than one is supplied, it is assumed these are the time points to be used to constrain the calculations (note that the constrained time points used will be the observed event times closest to the user supplied time points). If no value is specified, the default action is to use all observed event times.

nsplit

an non-negative integer value for number of random splits to consider for each candidate splitting variable. This significantly increases speed. When zero or NULL, the algorithm uses much slower deterministic splitting where all possible splits are considered. nsplit = 10L by default.

nodesizeTry

forest average terminal node size used at the tuning step.

nodedepth

maximum depth to which a tree should be grown. The default behaviour is that this parameter is ignored.

Value

If doBest = FALSE (default), this returns the optimal mtry value of those searched.

If doBest = TRUE, this returns the ltrcrrf object produced with the optimal mtry.

See Also

sbrier_ltrc for evaluation of model fit for the optimal value of mtry.

Examples

### Example with data pbcsample
library(survival)
Formula = Surv(Start, Stop, Event) ~ age + alk.phos + ast + chol + edema
## mtry tuned by the OOB procedure with stepFactor 3, number of trees built 10.
mtryT = tune.ltrcrrf(formula = Formula, data = pbcsample, stepFactor = 3,
                     ntreeTry = 10L)

[Package LTRCforests version 0.7.0 Index]