create.classifier.multivariate {SIMMS}R Documentation

Trains and tests a multivariate survival model

Description

Trains a model on training datasets. Predicts the risk score for all the training & datasets, independently. This function also predicts the risk score for combined training datasets cohort and validation datasets cohort. The risk score estimation is done by multivariate models fit by fit.survivalmodel. The function also predicts risk scores for each of the top.n.features independently.

Usage

create.classifier.multivariate(
  data.directory = ".",
  output.directory = ".",
  feature.selection.datasets = NULL,
  feature.selection.p.threshold = 0.05,
  training.datasets = NULL,
  validation.datasets = NULL,
  top.n.features = 25,
  models = c("1", "2", "3"),
  learning.algorithms = c("backward", "forward"),
  alpha.glm = c(1),
  k.fold.glm = 10,
  seed.value = 51214,
  cores.glm = 1,
  rf.ntree = 1000,
  rf.mtry = NULL,
  rf.nodesize = 15,
  rf.samptype = "swor",
  rf.sampsize = function(x) {     x * 0.66 },
  ...
)

Arguments

data.directory

Path to the directory containing datasets as specified by feature.selection.datasets, training.datasets, validation.datasets

output.directory

Path to the output folder where intermediate and results files will be saved

feature.selection.datasets

A vector containing names of datasets used for feature selection in function derive.network.features()

feature.selection.p.threshold

One of the P values that were used for feature selection in function derive.network.features(). This function does not support vector of P values as used in derive.network.features() for performance reasons

training.datasets

A vector containing names of training datasets

validation.datasets

A vector containing names of validation datasets

top.n.features

A numeric value specifying how many top ranked features will be used for univariate survival modelling

models

A character vector specifying which of the models ('1' = N+E, '2' = N, '3' = E) to run

learning.algorithms

A character vector specifying which learning algorithm to be used for model fitting and feature selection. Defaults to c('backward', 'forward'). Available options are: c('backward', 'forward', 'glm', 'randomforest')

alpha.glm

A numeric vector specifying elastic-net mixing parameter alpha, with range alpha raning from [0,1]. 1 for LASSO (default) and 0 for ridge. For multiple values of alpha, most optimal value is selected through cross validation on training set

k.fold.glm

A numeric value specifying k-fold cross validation if glm was chosen in learning.algorithms

seed.value

A numeric value specifying seed for glm k-fold cross or random forest validation if glm was chosen in learning.algorithms

cores.glm

An integer value specifying number of cores to be used for glm if it was chosen in learning.algorithms

rf.ntree

An integer value specifying the number of trees in random forest. Defaults to 1000. This should be tuned after starting with a large forest such as 1000 in the initial run and assessing the results in output\/OOB_error__TRAINING_* to see where the OOB error rate stablises, and then rerunning with the stablised rf.ntree parameter

rf.mtry

An integer value specifying the number of variables randomly selected for splitting a node. Defaults to sqrt(features), which is the same as in the underlying R package random survival forest randomForestSRC::rfsrc

rf.nodesize

An integer value specifying number of unique cases in a terminal node. Defaults to 15, which is the same as in the underlying R package random survival forest randomForestSRC::rfsrc

rf.samptype

An character string specifying name of sampling. Defaults to sampling without replacement 'swor'. Available options are: c('swor', 'swr')

rf.sampsize

A function specifying sampling size when rf.samptype is set to sampling without replacement ('swor'). Defaults to 66%: function(x){x * .66}

...

other params to be passed on to the random forest call to the underlying R package random survival forest randomForestSRC::rfsrc

Value

The output files are stored under output.directory/output/

Author(s)

Syed Haider & Vincent Stimper

Examples


# see package's main documentation


[Package SIMMS version 1.3.2 Index]