create.classifier.multivariate {SIMMS} | R Documentation |
Trains and tests a multivariate survival model
Description
Trains a model on training datasets. Predicts the risk score for all the
training & datasets, independently. This function also predicts the risk
score for combined training datasets cohort and validation datasets cohort.
The risk score estimation is done by multivariate models fit by
fit.survivalmodel
. The function also predicts risk scores for each of
the top.n.features
independently.
Usage
create.classifier.multivariate(
data.directory = ".",
output.directory = ".",
feature.selection.datasets = NULL,
feature.selection.p.threshold = 0.05,
training.datasets = NULL,
validation.datasets = NULL,
top.n.features = 25,
models = c("1", "2", "3"),
learning.algorithms = c("backward", "forward"),
alpha.glm = c(1),
k.fold.glm = 10,
seed.value = 51214,
cores.glm = 1,
rf.ntree = 1000,
rf.mtry = NULL,
rf.nodesize = 15,
rf.samptype = "swor",
rf.sampsize = function(x) { x * 0.66 },
...
)
Arguments
data.directory |
Path to the directory containing datasets as specified
by |
output.directory |
Path to the output folder where intermediate and results files will be saved |
feature.selection.datasets |
A vector containing names of datasets used
for feature selection in function |
feature.selection.p.threshold |
One of the P values that were used for
feature selection in function |
training.datasets |
A vector containing names of training datasets |
validation.datasets |
A vector containing names of validation datasets |
top.n.features |
A numeric value specifying how many top ranked features will be used for univariate survival modelling |
models |
A character vector specifying which of the models ('1' = N+E, '2' = N, '3' = E) to run |
learning.algorithms |
A character vector specifying which learning algorithm to be used for model fitting and feature selection. Defaults to c('backward', 'forward'). Available options are: c('backward', 'forward', 'glm', 'randomforest') |
alpha.glm |
A numeric vector specifying elastic-net mixing parameter alpha, with range alpha raning from [0,1]. 1 for LASSO (default) and 0 for ridge. For multiple values of alpha, most optimal value is selected through cross validation on training set |
k.fold.glm |
A numeric value specifying k-fold cross validation if glm
was chosen in |
seed.value |
A numeric value specifying seed for glm k-fold cross or random forest
validation if glm was chosen in |
cores.glm |
An integer value specifying number of cores to be used for
glm if it was chosen in |
rf.ntree |
An integer value specifying the number of trees in random forest. Defaults to 1000. This should be tuned after starting with a large forest such as 1000 in the initial run and assessing the results in output\/OOB_error__TRAINING_* to see where the OOB error rate stablises, and then rerunning with the stablised rf.ntree parameter |
rf.mtry |
An integer value specifying the number of variables randomly selected
for splitting a node. Defaults to sqrt(features), which is the same as in the
underlying R package random survival forest |
rf.nodesize |
An integer value specifying number of unique cases in a terminal
node. Defaults to 15, which is the same as in the underlying R package random survival
forest |
rf.samptype |
An character string specifying name of sampling. Defaults to sampling without replacement 'swor'. Available options are: c('swor', 'swr') |
rf.sampsize |
A function specifying sampling size when |
... |
other params to be passed on to the random forest call to the underlying
R package random survival forest |
Value
The output files are stored under output.directory
/output/
Author(s)
Syed Haider & Vincent Stimper
Examples
# see package's main documentation