| create.classifier.multivariate {SIMMS} | R Documentation |
Trains and tests a multivariate survival model
Description
Trains a model on training datasets. Predicts the risk score for all the
training & datasets, independently. This function also predicts the risk
score for combined training datasets cohort and validation datasets cohort.
The risk score estimation is done by multivariate models fit by
fit.survivalmodel. The function also predicts risk scores for each of
the top.n.features independently.
Usage
create.classifier.multivariate(
data.directory = ".",
output.directory = ".",
feature.selection.datasets = NULL,
feature.selection.p.threshold = 0.05,
training.datasets = NULL,
validation.datasets = NULL,
top.n.features = 25,
models = c("1", "2", "3"),
learning.algorithms = c("backward", "forward"),
alpha.glm = c(1),
k.fold.glm = 10,
seed.value = 51214,
cores.glm = 1,
rf.ntree = 1000,
rf.mtry = NULL,
rf.nodesize = 15,
rf.samptype = "swor",
rf.sampsize = function(x) { x * 0.66 },
...
)
Arguments
data.directory |
Path to the directory containing datasets as specified
by |
output.directory |
Path to the output folder where intermediate and results files will be saved |
feature.selection.datasets |
A vector containing names of datasets used
for feature selection in function |
feature.selection.p.threshold |
One of the P values that were used for
feature selection in function |
training.datasets |
A vector containing names of training datasets |
validation.datasets |
A vector containing names of validation datasets |
top.n.features |
A numeric value specifying how many top ranked features will be used for univariate survival modelling |
models |
A character vector specifying which of the models ('1' = N+E, '2' = N, '3' = E) to run |
learning.algorithms |
A character vector specifying which learning algorithm to be used for model fitting and feature selection. Defaults to c('backward', 'forward'). Available options are: c('backward', 'forward', 'glm', 'randomforest') |
alpha.glm |
A numeric vector specifying elastic-net mixing parameter alpha, with range alpha raning from [0,1]. 1 for LASSO (default) and 0 for ridge. For multiple values of alpha, most optimal value is selected through cross validation on training set |
k.fold.glm |
A numeric value specifying k-fold cross validation if glm
was chosen in |
seed.value |
A numeric value specifying seed for glm k-fold cross or random forest
validation if glm was chosen in |
cores.glm |
An integer value specifying number of cores to be used for
glm if it was chosen in |
rf.ntree |
An integer value specifying the number of trees in random forest. Defaults to 1000. This should be tuned after starting with a large forest such as 1000 in the initial run and assessing the results in output\/OOB_error__TRAINING_* to see where the OOB error rate stablises, and then rerunning with the stablised rf.ntree parameter |
rf.mtry |
An integer value specifying the number of variables randomly selected
for splitting a node. Defaults to sqrt(features), which is the same as in the
underlying R package random survival forest |
rf.nodesize |
An integer value specifying number of unique cases in a terminal
node. Defaults to 15, which is the same as in the underlying R package random survival
forest |
rf.samptype |
An character string specifying name of sampling. Defaults to sampling without replacement 'swor'. Available options are: c('swor', 'swr') |
rf.sampsize |
A function specifying sampling size when |
... |
other params to be passed on to the random forest call to the underlying
R package random survival forest |
Value
The output files are stored under output.directory/output/
Author(s)
Syed Haider & Vincent Stimper
Examples
# see package's main documentation