minNodePruning {discSurv}R Documentation

Minimal Node Size Pruning

Description

Computes optimal minimal node size of a discrete survival tree from a given vector of possible node sizes by cross-validation. Laplace-smoothing can be applied to the estimated hazards.

Usage

minNodePruning(
  formula,
  data,
  treetype = "rpart",
  splitruleranger = "hellinger",
  sizes,
  indexList,
  timeColumn,
  eventColumn,
  lambda = 1,
  logOut = FALSE
)

Arguments

formula

Model formula for tree fitting("class formula")

data

Discrete survival data in short format for which a survival tree is to be fitted ("class data.frame").

treetype

Type of tree to be fitted ("character vector"). Possible values are "rpart" or "ranger". The default is to fit an rpart tree; when "ranger" is chosen, a ranger forest with a single tree is fitted.

splitruleranger

String specifying the splitting rule of the ranger tree("character vector"). Possible values are either "gini", "extratrees" or "hellinger". Default is "hellinger".

sizes

Vector of different node sizes to try ("integer vector"). Values should be non-negative.

indexList

List of data partitioning indices for cross-validation ("class list"). Each element represents the test indices of one fold ("integer vector").

timeColumn

Character giving the column name of the observed times in the data argument ("character vector").

eventColumn

Character giving the column name of the event indicator in the data argument ("character vector").

lambda

Parameter for laplace-smoothing. A value of 0 corresponds to no laplace-smoothing ("numeric vector").

logOut

Logical value ("logical vector"). If the argument is set to TRUE, then computation progress will be written to console.

Details

Computes the out-of-sample log likelihood for all data partitionings for each node size in sizes and returns the node size for which the log likelihood was minimal. Also returns an rpart tree with the optimal minimal node size using the entire data set.

Value

A list containing the two items

Examples

library(pec)
library(caret)
data(cost)
# Take subsample and convert time to years
cost$time <- ceiling(cost$time / 365)
costSub <- cost[1:50, ]
# Specify column names for data augmentation
timeColumn <- "time"
eventColumn <- "status"
# Create data partition for cross validation
indexList <- createFolds(costSub$status * max(costSub$time) + costSub$time, k = 5)
# specify function arguments and perform node size pruning
formula <- y ~ timeInt + prevStroke + age + sex
sizes <- 1:10
optiTree <- minNodePruning(formula, costSub, treetype = "rpart", sizes = sizes, 
indexList = indexList, timeColumn =  timeColumn, eventColumn = eventColumn, 
lambda = 1, logOut = TRUE)

[Package discSurv version 2.0.0 Index]