ph_train {pheble} | R Documentation |
Generate predictions for phenotype ensemble.
Description
The ph_train
function automatically trains a set of binary or multi-class classification models to ultimately
build a new dataset of predictions. The data preprocessing and hyperparameter tuning are handled internally to
minimize user input and simplify the training.
Usage
ph_train(
train_df,
vali_df,
test_df,
class_col,
ctrl,
train_seed = 123,
n_cores = 2,
task = "multi",
methods = "all",
metric = ifelse(task == "multi", "Kappa", "ROC"),
tune_length = 10,
quiet = FALSE
)
Arguments
train_df |
A |
vali_df |
A |
test_df |
A |
class_col |
A |
ctrl |
A |
train_seed |
A |
n_cores |
An |
task |
A |
methods |
A
|
metric |
A |
tune_length |
If |
quiet |
A |
Value
A list containing the following components:
train_models | The train models for the ensemble. |
train_df | The training data frame. |
vali_df | The validation data frame. |
test_df | The test data frame. |
task | The type of classification task. |
ctrl | A list of resampling parameters used in trainControl . |
methods | The names of the classification methods to ensemble. |
search | The hyperparameter search strategy. |
n_cores | The number of cores for parallel processing. |
metric | The summary metric used to select the optimal model. |
tune_length | The maximum number of hyperparameter combinations ("random") or individual hyperparameter depth ("grid"). |
Examples
## Import data.
data(ph_crocs)
## Remove anomalies with autoencoder.
rm_outs <- ph_anomaly(df = ph_crocs, ids_col = "Biosample",
class_col = "Species", method = "ae")
## Preprocess anomaly-free data frame into train, validation, and test sets
## with PCs as predictors.
pc_dfs <- ph_prep(df = rm_outs$df, ids_col = "Biosample",
class_col = "Species", vali_pct = 0.15,
test_pct = 0.15, method = "pca")
## Echo control object for train function.
ctrl <- ph_ctrl(ph_crocs$Species, resample_method = "boot")
## Train all models for ensemble.
## Note: Increasing n_cores will dramatically reduce train time.
train_models <- ph_train(train_df = pc_dfs$train_df,
vali_df = pc_dfs$vali_df,
test_df = pc_dfs$test_df,
class_col = "Species",
ctrl = ctrl,
task = "multi",
methods = "all",
tune_length = 5,
quiet = FALSE)
## You can also train just a few, although more is preferable.
## Note: Increasing n_cores will dramatically reduce train time.
train_models <- ph_train(train_df = pc_dfs$train_df,
vali_df = pc_dfs$vali_df,
test_df = pc_dfs$test_df,
class_col = "Species",
ctrl = ctrl,
task = "multi",
methods = c("lda", "mda",
"nnet", "pda", "sparseLDA"),
tune_length = 5,
quiet = FALSE)