| trans_classifier {microeco} | R Documentation |
Create trans_classifier object for machine-learning-based model prediction.
Description
This class is a wrapper for methods of machine-learning-based classification or regression models, including data pre-processing, feature selection, data split, model training, prediction, confusionMatrix and ROC (Receiver Operator Characteristic) or PR (Precision-Recall) curve.
Author(s): Felipe Mansoldo and Chi Liu
Methods
Public methods
Method new()
Create a trans_classifier object.
Usage
trans_classifier$new( dataset, x.predictors = "Genus", y.response = NULL, n.cores = 1 )
Arguments
datasetan object of
microtableclass.x.predictorsdefault "Genus"; character string or data.frame; a character string represents selecting the corresponding data from
microtable$taxa_abund; data.frame denotes other customized input. See the following available options:- 'Genus'
use Genus level table in
microtable$taxa_abund, or other specific taxonomic rank, e.g., 'Phylum'. If an input level (e.g., ASV) is not found in the names of taxa_abund list, the function will useotu_tableto calculate relative abundance of features.- 'all'
use all the levels stored in
microtable$taxa_abund.- other input
must be a data.frame object. It should have the same format with the tables in microtable$taxa_abund, i.e. rows are features; columns are samples with same names in sample_table.
y.responsedefault NULL; the response variable in
sample_tableof inputmicrotableobject.n.coresdefault 1; the CPU thread used.
Returns
data_feature and data_response stored in the object.
Examples
\donttest{
data(dataset)
t1 <- trans_classifier$new(
dataset = dataset,
x.predictors = "Genus",
y.response = "Group")
}
Method cal_preProcess()
Pre-process (centering, scaling etc.) of the feature data based on the caret::preProcess function. See https://topepo.github.io/caret/pre-processing.html for more details.
Usage
trans_classifier$cal_preProcess(...)
Arguments
...parameters pass to
preProcessfunction of caret package.
Returns
preprocessed data_feature in the object.
Examples
\dontrun{
# "nzv" removes near zero variance predictors
t1$cal_preProcess(method = c("center", "scale", "nzv"))
}
Method cal_feature_sel()
Perform feature selection. See https://topepo.github.io/caret/feature-selection-overview.html for more details.
Usage
trans_classifier$cal_feature_sel( boruta.maxRuns = 300, boruta.pValue = 0.01, boruta.repetitions = 4, ... )
Arguments
boruta.maxRunsdefault 300; maximal number of importance source runs; passed to the
maxRunsparameter inBorutafunction of Boruta package.boruta.pValuedefault 0.01; p value passed to the pValue parameter in
Borutafunction of Boruta package.boruta.repetitionsdefault 4; repetition runs for the feature selection.
...parameters pass to
Borutafunction of Boruta package.
Returns
optimized data_feature in the object.
Examples
\dontrun{
t1$cal_feature_sel(boruta.maxRuns = 300, boruta.pValue = 0.01)
}
Method cal_split()
Split data for training and testing.
Usage
trans_classifier$cal_split(prop.train = 3/4)
Arguments
prop.traindefault 3/4; the ratio of the data used for the training.
Returns
data_train and data_test in the object.
Examples
\dontrun{
t1$cal_split(prop.train = 3/4)
}
Method set_trainControl()
Control parameters for the following training. Please see trainControl function of caret package for details.
Usage
trans_classifier$set_trainControl( method = "repeatedcv", classProbs = TRUE, savePredictions = TRUE, ... )
Arguments
methoddefault 'repeatedcv'; 'repeatedcv': Repeated k-Fold cross validation; see method parameter in
trainControlfunction ofcaretpackage for available options.classProbsdefault TRUE; should class probabilities be computed for classification models?; see classProbs parameter in
caret::trainControlfunction.savePredictionsdefault TRUE; see
savePredictionsparameter incaret::trainControlfunction....parameters pass to
trainControlfunction of caret package.
Returns
trainControl in the object.
Examples
\dontrun{
t1$set_trainControl(method = 'repeatedcv')
}
Method cal_train()
Run the model training. Please see https://topepo.github.io/caret/available-models.html for available models.
Usage
trans_classifier$cal_train(method = "rf", max.mtry = 2, ntree = 500, ...)
Arguments
methoddefault "rf"; "rf": random forest; see method in
trainfunction of caret package for other options. For method = "rf", thetuneGridis set:expand.grid(mtry = seq(from = 1, to = max.mtry))max.mtrydefault 2; for method = "rf"; maximum mtry used in the
tuneGridto do hyperparameter tuning to optimize the model.ntreedefault 500; for method = "rf"; Number of trees to grow. The default 500 is same with the
ntreeparameter inrandomForestfunction in randomForest package. When it is a vector with more than one element, the function will try to optimize the model to select a best one, such asc(100, 500, 1000)....parameters pass to
caret::trainfunction.
Returns
res_train in the object.
Examples
\dontrun{
# random forest
t1$cal_train(method = "rf")
# Support Vector Machines with Radial Basis Function Kernel
t1$cal_train(method = "svmRadial", tuneLength = 15)
}
Method cal_feature_imp()
Get feature importance from the training model.
Usage
trans_classifier$cal_feature_imp(rf_feature_sig = FALSE, ...)
Arguments
rf_feature_sigdefault FALSE; whether calculate feature significance in 'rf' model using
rfPermutepackage; only available formethod = "rf"incal_trainfunction;...parameters pass to
varImpfunction of caret package. Ifrf_feature_sigis TURE andtrain_methodis "rf", the parameters will be passed torfPermutefunction of rfPermute package.
Returns
res_feature_imp in the object. One row for each predictor variable. The column(s) are different importance measures.
For the method 'rf', it is MeanDecreaseGini (classification) or IncNodePurity (regression) when rf_feature_sig = FALSE.
Examples
\dontrun{
t1$cal_feature_imp()
}
Method plot_feature_imp()
Bar plot for feature importance.
Usage
trans_classifier$plot_feature_imp( rf_sig_show = NULL, show_sig_group = FALSE, ... )
Arguments
rf_sig_showdefault NULL; "MeanDecreaseAccuracy" (Default) or "MeanDecreaseGini" for random forest classification; "%IncMSE" (Default) or "IncNodePurity" for random forest regression; Only available when
rf_feature_sig = TRUEin functioncal_feature_imp, which generate "MeanDecreaseGini" (and "MeanDecreaseAccuracy") or "%IncMSE" (and "IncNodePurity") in the column names ofres_feature_imp; Function can also generate "Significance" according to the p value.show_sig_groupdefault FALSE; whether show the features with different significant groups; Only available when "Significance" is found in the data.
...parameters pass to
plot_diff_barfunction oftrans_diffpackage.
Returns
ggplot2 object.
Examples
\dontrun{
t1$plot_feature_imp(use_number = 1:20, coord_flip = FALSE)
}
Method cal_predict()
Run the prediction.
Usage
trans_classifier$cal_predict(positive_class = NULL)
Arguments
positive_classdefault NULL; see positive parameter in
confusionMatrixfunction of caret package; If positive_class is NULL, use the first group in data as the positive class automatically.
Returns
res_predict, res_confusion_fit and res_confusion_stats stored in the object.
The res_predict is the predicted result for data_test.
Several evaluation metrics in res_confusion_fit are defined as follows:
Accuracy = \frac{TP + TN}{TP + TN + FP + FN}
Sensitivity = Recall = TPR = \frac{TP}{TP + FN}
Specificity = TNR = 1 - FPR = \frac{TN}{TN + FP}
Precision = \frac{TP}{TP + FP}
Prevalence = \frac{TP + FN}{TP + TN + FP + FN}
F1-Score = \frac{2 * Precision * Recall}{Precision + Recall}
Kappa = \frac{Accuracy - Pe}{1 - Pe}
where TP is true positive; TN is ture negative; FP is false positive; and FN is false negative; FPR is False Positive Rate; TPR is True Positive Rate; TNR is True Negative Rate; Pe is the hypothetical probability of chance agreement on the classes for reference and prediction in the confusion matrix. Accuracy represents the ratio of correct predictions. Precision identifies how the model accurately predicted the positive classes. Recall (sensitivity) measures the ratio of actual positives that are correctly identified by the model. F1-score is the weighted average score of recall and precision. The value at 1 is the best performance and at 0 is the worst. Prevalence represents how often positive events occurred. Kappa identifies how well the model is predicting.
Examples
\dontrun{
t1$cal_predict()
}
Method plot_confusionMatrix()
Plot the cross-tabulation of observed and predicted classes with associated statistics based on the results of function cal_predict.
Usage
trans_classifier$plot_confusionMatrix( plot_confusion = TRUE, plot_statistics = TRUE )
Arguments
plot_confusiondefault TRUE; whether plot the confusion matrix.
plot_statisticsdefault TRUE; whether plot the statistics.
Returns
ggplot object.
Examples
\dontrun{
t1$plot_confusionMatrix()
}
Method cal_ROC()
Get ROC (Receiver Operator Characteristic) curve data and the performance data.
Usage
trans_classifier$cal_ROC(input = "pred")
Arguments
inputdefault "pred"; 'pred' or 'train'; 'pred' represents using prediction results; 'train' represents using training results.
Returns
a list res_ROC stored in the object. It has two tables: res_roc and res_pr. AUC: Area Under the ROC Curve.
For the definition of metrics, please refer to the return part of function cal_predict.
Examples
\dontrun{
t1$cal_ROC()
}
Method plot_ROC()
Plot ROC curve.
Usage
trans_classifier$plot_ROC(
plot_type = c("ROC", "PR")[1],
plot_group = "all",
color_values = RColorBrewer::brewer.pal(8, "Dark2"),
add_AUC = TRUE,
plot_method = FALSE,
...
)Arguments
plot_typedefault c("ROC", "PR")[1]; 'ROC' represents ROC (Receiver Operator Characteristic) curve; 'PR' represents PR (Precision-Recall) curve.
plot_groupdefault "all"; 'all' represents all the classes in the model; 'add' represents all adding micro-average and macro-average results, see https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html; other options should be one or more class names, same with the names in Group column of res_ROC$res_roc from cal_ROC function.
color_valuesdefault RColorBrewer::brewer.pal(8, "Dark2"); colors used in the plot.
add_AUCdefault TRUE; whether add AUC in the legend.
plot_methoddefault FALSE; If TRUE, show the method in the legend though only one method is found.
...parameters pass to
geom_pathfunction of ggplot2 package.
Returns
ggplot2 object.
Examples
\dontrun{
t1$plot_ROC(size = 1, alpha = 0.7)
}
Method cal_caretList()
Use caretList function of caretEnsemble package to run multiple models. For the available models, please run names(getModelInfo()).
Usage
trans_classifier$cal_caretList(...)
Arguments
...parameters pass to
caretListfunction ofcaretEnsemblepackage.
Returns
res_caretList_models in the object.
Examples
\dontrun{
t1$cal_caretList(methodList = c('rf', 'svmRadial'))
}
Method cal_caretList_resamples()
Use resamples function of caret package to collect the metric values based on the res_caretList_models data.
Usage
trans_classifier$cal_caretList_resamples(...)
Arguments
...parameters pass to
resamplesfunction ofcaretpackage.
Returns
res_caretList_resamples list and res_caretList_resamples_reshaped table in the object.
Examples
\dontrun{
t1$cal_caretList_resamples()
}
Method plot_caretList_resamples()
Visualize the metric values based on the res_caretList_resamples_reshaped data.
Usage
trans_classifier$plot_caretList_resamples( color_values = RColorBrewer::brewer.pal(8, "Dark2"), ... )
Arguments
color_valuesdefault
RColorBrewer::brewer.pal(8, "Dark2"); colors palette for the box....parameters pass to
geom_boxplotfunction ofggplot2package.
Returns
ggplot object.
Examples
\dontrun{
t1$plot_caretList_resamples()
}
Method clone()
The objects of this class are cloneable with this method.
Usage
trans_classifier$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
Examples
## ------------------------------------------------
## Method `trans_classifier$new`
## ------------------------------------------------
data(dataset)
t1 <- trans_classifier$new(
dataset = dataset,
x.predictors = "Genus",
y.response = "Group")
## ------------------------------------------------
## Method `trans_classifier$cal_preProcess`
## ------------------------------------------------
## Not run:
# "nzv" removes near zero variance predictors
t1$cal_preProcess(method = c("center", "scale", "nzv"))
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$cal_feature_sel`
## ------------------------------------------------
## Not run:
t1$cal_feature_sel(boruta.maxRuns = 300, boruta.pValue = 0.01)
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$cal_split`
## ------------------------------------------------
## Not run:
t1$cal_split(prop.train = 3/4)
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$set_trainControl`
## ------------------------------------------------
## Not run:
t1$set_trainControl(method = 'repeatedcv')
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$cal_train`
## ------------------------------------------------
## Not run:
# random forest
t1$cal_train(method = "rf")
# Support Vector Machines with Radial Basis Function Kernel
t1$cal_train(method = "svmRadial", tuneLength = 15)
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$cal_feature_imp`
## ------------------------------------------------
## Not run:
t1$cal_feature_imp()
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$plot_feature_imp`
## ------------------------------------------------
## Not run:
t1$plot_feature_imp(use_number = 1:20, coord_flip = FALSE)
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$cal_predict`
## ------------------------------------------------
## Not run:
t1$cal_predict()
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$plot_confusionMatrix`
## ------------------------------------------------
## Not run:
t1$plot_confusionMatrix()
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$cal_ROC`
## ------------------------------------------------
## Not run:
t1$cal_ROC()
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$plot_ROC`
## ------------------------------------------------
## Not run:
t1$plot_ROC(size = 1, alpha = 0.7)
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$cal_caretList`
## ------------------------------------------------
## Not run:
t1$cal_caretList(methodList = c('rf', 'svmRadial'))
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$cal_caretList_resamples`
## ------------------------------------------------
## Not run:
t1$cal_caretList_resamples()
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$plot_caretList_resamples`
## ------------------------------------------------
## Not run:
t1$plot_caretList_resamples()
## End(Not run)