ph_ensemble {pheble} | R Documentation |
Classify phenotypes via ensemble learning.
Description
The ph_ensemble
function uses classification predictions from a list of algorithms to train an ensemble model.
This can be a list of manually trained algorithms from train
or, more conveniently, the output from ph_train
.
The hyperparameter tuning and model evaluations are handled internally to simplify the ensembling process. This function
assumes some preprocessing has been performed, hence the training, validation, and test set requirements.
Usage
ph_ensemble(
train_models,
train_df,
vali_df,
test_df,
class_col,
ctrl,
train_seed = 123,
n_cores = 2,
task = "multi",
metric = ifelse(task == "multi", "Kappa", "ROC"),
top_models = 3,
metalearner = ifelse(task == "multi", "glmnet", "rf"),
tune_length = 10,
quiet = FALSE
)
Arguments
train_models |
A |
train_df |
A |
vali_df |
A |
test_df |
A |
class_col |
A |
ctrl |
A |
train_seed |
A |
n_cores |
An |
task |
A |
metric |
A |
top_models |
A |
metalearner |
A |
tune_length |
If |
quiet |
A |
Value
A list containing the following components:
ensemble_test_preds | The ensemble predictions for the test set. |
vali_preds | The validation predictions for the top models. |
test_preds | The test predictions for the top models. |
all_test_preds | The test predictions for every successfully trained model. |
all_test_results | The confusion matrix results obtained from comparing the model test predictions (i.e., original models and ensemble) against the actual test classes. |
ensemble_model | The ensemble train object. |
var_imps | The ensemble variable importances obtained via weighted averaging. The original train importances are multiplied by the model's importance in the ensemble, then averaged across models and normalized. |
train_df | The training data frame. |
vali_df | The validation data frame. |
test_df | The test data frame. |
train_models | The train models for the ensemble. |
ctrl | A trainControl object. |
metric | The summary metric used to select the optimal model. |
task | The type of classification task. |
tune_length | The maximum number of hyperparameter combinations ("random") or individual hyperparameter depth ("grid"). |
top_models | The number of top methods selected for the ensemble. |
metalearner | The algorithm used to train the ensemble. |
Examples
## Import data.
data(ph_crocs)
## Remove anomalies with autoencoder.
rm_outs <- ph_anomaly(df = ph_crocs, ids_col = "Biosample",
class_col = "Species", method = "ae")
## Preprocess anomaly-free data frame into train, validation, and test sets
## with PCs as predictors.
pc_dfs <- ph_prep(df = rm_outs$df, ids_col = "Biosample",
class_col = "Species", vali_pct = 0.15,
test_pct = 0.15, method = "pca")
## Echo control object for train function.
ctrl <- ph_ctrl(ph_crocs$Species, resample_method = "boot")
## Train all models for ensemble.
## Note: Increasing n_cores will dramatically reduce train time.
train_models <- ph_train(train_df = pc_dfs$train_df,
vali_df = pc_dfs$vali_df,
test_df = pc_dfs$test_df,
class_col = "Species",
ctrl = ctrl,
task = "multi",
methods = "all",
tune_length = 5,
quiet = FALSE)
## You can also train just a few, although more is preferable.
## Note: Increasing n_cores will dramatically reduce train time.
train_models <- ph_train(train_df = pc_dfs$train_df,
vali_df = pc_dfs$vali_df,
test_df = pc_dfs$test_df,
class_col = "Species",
ctrl = ctrl,
task = "multi",
methods = c("lda", "mda",
"nnet", "pda", "sparseLDA"),
tune_length = 5,
quiet = FALSE)
## Train the ensemble.
## Note: Increasing n_cores will dramatically reduce train time.
ensemble_model <- ph_ensemble(train_models = train_models$train_models,
train_df = pc_dfs$train_df,
vali_df = pc_dfs$vali_df,
test_df = pc_dfs$test_df,
class_col = "Species",
ctrl = ctrl,
task = "multi",
top_models = 3,
metalearner = "glmnet",
tune_length = 25,
quiet = FALSE)