ph_prep {pheble} | R Documentation |
Preprocessing for phenotype classification via ensemble learning.
Description
The ph_prep
function splits a data frame into training, validation, and test sets, all while ensuring that
every class is represented in each dataset. By default, it performs a Principal Component Analysis on the training
set data and projects the validation and test data into that space. If a non-linear dimensionality reduction
strategy is preferred instead, an autoencoder can be used to extract deep features. Note that the parameters
max_mem_size
, activation
, hidden
, dropout_ratio
, rate
, search
, and
tune_length
are NULL
unless an autoencoder, method = "ae"
, is used. In this case,
lists or vectors can be supplied to these parameters (see parameter details) to perform a grid search for the
optimal hyperparameter combination. The autoencoder with the lowest reconstruction error is selected as
the best model.
Usage
ph_prep(
df,
ids_col,
class_col,
vali_pct = 0.15,
test_pct = 0.15,
scale = FALSE,
center = NULL,
sd = NULL,
split_seed = 123,
method = "pca",
pca_pct = 0.95,
max_mem_size = "15g",
port = 54321,
train_seed = 123,
hyper_params = list(),
search = "random",
tune_length = 100
)
Arguments
df |
A |
ids_col |
A |
class_col |
A |
vali_pct |
A |
test_pct |
A |
scale |
A |
center |
Either a |
sd |
Either a |
split_seed |
A |
method |
A |
pca_pct |
If |
max_mem_size |
If |
port |
A |
train_seed |
A |
hyper_params |
A |
search |
If |
tune_length |
If |
Value
A list containing the following components:
train_df | The training set data frame. |
vali_df | The validation set data frame. |
test_df | The test set data frame. |
train_split | The training set indices from the original data frame. |
vali_split | The validation set indices from the original data frame. |
test_split | The test set indices from the original data frame. |
vali_pct | The percentage of training data used as validation data. |
test_pct | The percentage of total data used as test data. |
method | The dimensionality reduction method. |
Examples
## Import data.
data(ph_crocs)
## Remove anomalies with autoencoder.
rm_outs <- ph_anomaly(df = ph_crocs, ids_col = "Biosample",
class_col = "Species", method = "ae")
## Preprocess anomaly-free data frame into train, validation, and test sets
## with PCs as predictors.
pc_dfs <- ph_prep(df = rm_outs$df, ids_col = "Biosample",
class_col = "Species", vali_pct = 0.15,
test_pct = 0.15, method = "pca")
## Alternatively, preprocess data frame into train, validation, and test
## sets with latent variables as predictors. Notice that port is defined,
## because running H2O sessions one after another can cause connection
## errors.
ae_dfs <- ph_prep(df = rm_outs$df, ids_col = "Biosample", class_col = "Species",
vali_pct = 0.15, test_pct = 0.15, method = "ae", port = 50001)