R: Trains and tests a multivariate survival model

create.classifier.multivariate {SIMMS}

R Documentation

Trains and tests a multivariate survival model

Description

Trains a model on training datasets. Predicts the risk score for all the training & datasets, independently. This function also predicts the risk score for combined training datasets cohort and validation datasets cohort. The risk score estimation is done by multivariate models fit by fit.survivalmodel. The function also predicts risk scores for each of the top.n.features independently.

Usage

create.classifier.multivariate(
  data.directory = ".",
  output.directory = ".",
  feature.selection.datasets = NULL,
  feature.selection.p.threshold = 0.05,
  training.datasets = NULL,
  validation.datasets = NULL,
  top.n.features = 25,
  models = c("1", "2", "3"),
  learning.algorithms = c("backward", "forward"),
  alpha.glm = c(1),
  k.fold.glm = 10,
  seed.value = 51214,
  cores.glm = 1,
  rf.ntree = 1000,
  rf.mtry = NULL,
  rf.nodesize = 15,
  rf.samptype = "swor",
  rf.sampsize = function(x) {     x * 0.66 },
  ...
)

Arguments

`data.directory`	Path to the directory containing datasets as specified by `feature.selection.datasets`, `training.datasets`, `validation.datasets`
`output.directory`	Path to the output folder where intermediate and results files will be saved
`feature.selection.datasets`	A vector containing names of datasets used for feature selection in function `derive.network.features()`
`feature.selection.p.threshold`	One of the P values that were used for feature selection in function `derive.network.features()`. This function does not support vector of P values as used in `derive.network.features()` for performance reasons
`training.datasets`	A vector containing names of training datasets
`validation.datasets`	A vector containing names of validation datasets
`top.n.features`	A numeric value specifying how many top ranked features will be used for univariate survival modelling
`models`	A character vector specifying which of the models ('1' = N+E, '2' = N, '3' = E) to run
`learning.algorithms`	A character vector specifying which learning algorithm to be used for model fitting and feature selection. Defaults to c('backward', 'forward'). Available options are: c('backward', 'forward', 'glm', 'randomforest')
`alpha.glm`	A numeric vector specifying elastic-net mixing parameter alpha, with range alpha raning from [0,1]. 1 for LASSO (default) and 0 for ridge. For multiple values of alpha, most optimal value is selected through cross validation on training set
`k.fold.glm`	A numeric value specifying k-fold cross validation if glm was chosen in `learning.algorithms`
`seed.value`	A numeric value specifying seed for glm k-fold cross or random forest validation if glm was chosen in `learning.algorithms`
`cores.glm`	An integer value specifying number of cores to be used for glm if it was chosen in `learning.algorithms`
`rf.ntree`	An integer value specifying the number of trees in random forest. Defaults to 1000. This should be tuned after starting with a large forest such as 1000 in the initial run and assessing the results in output\/OOB_error__TRAINING_* to see where the OOB error rate stablises, and then rerunning with the stablised rf.ntree parameter
`rf.mtry`	An integer value specifying the number of variables randomly selected for splitting a node. Defaults to sqrt(features), which is the same as in the underlying R package random survival forest `randomForestSRC::rfsrc`
`rf.nodesize`	An integer value specifying number of unique cases in a terminal node. Defaults to 15, which is the same as in the underlying R package random survival forest `randomForestSRC::rfsrc`
`rf.samptype`	An character string specifying name of sampling. Defaults to sampling without replacement 'swor'. Available options are: c('swor', 'swr')
`rf.sampsize`	A function specifying sampling size when `rf.samptype` is set to sampling without replacement ('swor'). Defaults to 66%: `function(x){x * .66}`
`...`	other params to be passed on to the random forest call to the underlying R package random survival forest `randomForestSRC::rfsrc`

Value

The output files are stored under output.directory/output/

Author(s)

Syed Haider & Vincent Stimper

Examples


# see package's main documentation

[Package SIMMS version 1.3.2 Index]