trans_classifier {microeco} | R Documentation |
Create trans_classifier object for machine-learning-based model prediction.
Description
This class is a wrapper for methods of machine-learning-based classification or regression models, including data pre-processing, feature selection, data split, model training, prediction, confusionMatrix and ROC (Receiver Operator Characteristic) or PR (Precision-Recall) curve.
Author(s): Felipe Mansoldo and Chi Liu
Methods
Public methods
Method new()
Create the trans_classifier object.
Usage
trans_classifier$new( dataset, x.predictors = "Genus", y.response = NULL, n.cores = 1 )
Arguments
dataset
the object of
microtable
Class.x.predictors
default "Genus"; character string or data.frame; a character string represents selecting the corresponding data from microtable$taxa_abund; data.frame represents other customized input. See the following available options:
- 'Genus'
use Genus level table in microtable$taxa_abund, or other specific taxonomic rank, e.g. 'Phylum'
- 'all'
use all the taxa stored in microtable$taxa_abund
- other input
must be a data.frame; It should have the same format with the data.frame in microtable$taxa_abund, i.e. rows are features; cols are samples with same names in sample_table
y.response
default NULL; the response variable in
sample_table
of inputmicrotable
object.n.cores
default 1; the CPU thread used.
Returns
data_feature and data_response in the object.
Examples
\donttest{ data(dataset) t1 <- trans_classifier$new( dataset = dataset, x.predictors = "Genus", y.response = "Group") }
Method cal_preProcess()
Pre-process (centering, scaling etc.) of the feature data based on the caret::preProcess function. See https://topepo.github.io/caret/pre-processing.html for more details.
Usage
trans_classifier$cal_preProcess(...)
Arguments
...
parameters pass to
preProcess
function of caret package.
Returns
converted data_feature in the object.
Examples
\dontrun{ t1$cal_preProcess(method = c("center", "scale", "nzv")) }
Method cal_feature_sel()
Perform feature selection. See https://topepo.github.io/caret/feature-selection-overview.html for more details.
Usage
trans_classifier$cal_feature_sel( boruta.maxRuns = 300, boruta.pValue = 0.01, boruta.repetitions = 4, ... )
Arguments
boruta.maxRuns
default 300; maximal number of importance source runs; passed to the maxRuns parameter in Boruta function of Boruta package.
boruta.pValue
default 0.01; p value passed to the pValue parameter in Boruta function of Boruta package.
boruta.repetitions
default 4; repetition runs for the feature selection.
...
parameters pass to
Boruta
function of Boruta package.
Returns
optimized data_feature in the object.
Examples
\dontrun{ t1$cal_feature_sel(boruta.maxRuns = 300, boruta.pValue = 0.01) }
Method cal_split()
Split data for training and testing.
Usage
trans_classifier$cal_split(prop.train = 3/4)
Arguments
prop.train
default 3/4; the ratio of the data used for the training.
Returns
data_train and data_test in the object.
Examples
\dontrun{ t1$cal_split(prop.train = 3/4) }
Method set_trainControl()
Control parameters for the following training. See trainControl function of caret package for details.
Usage
trans_classifier$set_trainControl( method = "repeatedcv", classProbs = TRUE, savePredictions = TRUE, ... )
Arguments
method
default 'repeatedcv'; 'repeatedcv': Repeated k-Fold cross validation; see method parameter in
trainControl
function ofcaret
package for available options.classProbs
default TRUE; should class probabilities be computed for classification models?; see classProbs parameter in
caret::trainControl
function.savePredictions
default TRUE; see
savePredictions
parameter incaret::trainControl
function....
parameters pass to trainControl function of caret package.
Returns
trainControl in the object.
Examples
\dontrun{ t1$set_trainControl(method = 'repeatedcv') }
Method cal_train()
Run the model training.
Usage
trans_classifier$cal_train(method = "rf", max.mtry = 2, max.ntree = 200, ...)
Arguments
method
default "rf"; "rf": random forest; see method in caret::train function for other options.
max.mtry
default 2; for method = "rf"; maximum mtry used for the tunegrid to do hyperparameter tuning to optimize the model.
max.ntree
default 200; for method = "rf"; maximum number of trees used to optimize the model.
...
parameters pass to
caret::train
function.
Returns
res_train in the object.
Examples
\dontrun{ # random forest t1$cal_train(method = "rf") # Support Vector Machines with Radial Basis Function Kernel t1$cal_train(method = "svmRadial", tuneLength = 15) }
Method cal_feature_imp()
Get feature importance from the training model.
Usage
trans_classifier$cal_feature_imp(...)
Arguments
...
parameters pass to varImp function of caret package.
Returns
res_feature_imp in the object. One row for each predictor variable. The column(s) are different importance measures. For the method 'rf', it is MeanDecreaseGini (classification) or IncNodePurity (regression).
Examples
\dontrun{ t1$cal_feature_imp() }
Method plot_feature_imp()
Bar plot for feature importance.
Usage
trans_classifier$plot_feature_imp(...)
Arguments
...
parameters pass to
plot_diff_bar
function oftrans_diff
package.
Returns
ggplot2 object.
Examples
\dontrun{ t1$plot_feature_imp(use_number = 1:20, coord_flip = FALSE) }
Method cal_predict()
Run the prediction.
Usage
trans_classifier$cal_predict(positive_class = NULL)
Arguments
positive_class
default NULL; see positive parameter in confusionMatrix function of caret package; If positive_class is NULL, use the first group in data as the positive class automatically.
Returns
res_predict, res_confusion_fit and res_confusion_stats stored in the object.
Examples
\dontrun{ t1$cal_predict() }
Method plot_confusionMatrix()
Plot the cross-tabulation of observed and predicted classes with associated statistics.
Usage
trans_classifier$plot_confusionMatrix( plot_confusion = TRUE, plot_statistics = TRUE )
Arguments
plot_confusion
default TRUE; whether plot the confusion matrix.
plot_statistics
default TRUE; whether plot the statistics.
Returns
ggplot object.
Examples
\dontrun{ t1$plot_confusionMatrix() }
Method cal_ROC()
Get ROC (Receiver Operator Characteristic) curve data and the performance data.
Usage
trans_classifier$cal_ROC(input = "pred")
Arguments
input
default "pred"; 'pred' or 'train'; 'pred' represents using prediction results; 'train' represents using training results.
Returns
a list res_ROC stored in the object.
Examples
\dontrun{ t1$cal_ROC() }
Method plot_ROC()
Plot ROC curve.
Usage
trans_classifier$plot_ROC( plot_type = c("ROC", "PR")[1], plot_group = "all", color_values = RColorBrewer::brewer.pal(8, "Dark2"), add_AUC = TRUE, plot_method = FALSE, ... )
Arguments
plot_type
default c("ROC", "PR")[1]; 'ROC' represents ROC (Receiver Operator Characteristic) curve; 'PR' represents PR (Precision-Recall) curve.
plot_group
default "all"; 'all' represents all the classes in the model; 'add' represents all adding micro-average and macro-average results, see https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html; other options should be one or more class names, same with the names in Group column of res_ROC$res_roc from cal_ROC function.
color_values
default RColorBrewer::brewer.pal(8, "Dark2"); colors used in the plot.
add_AUC
default TRUE; whether add AUC in the legend.
plot_method
default FALSE; If TRUE, show the method in the legend though only one method is found.
...
parameters pass to geom_path function of ggplot2 package.
Returns
ggplot2 object.
Examples
\dontrun{ t1$plot_ROC(size = 1, alpha = 0.7) }
Method cal_caretList()
Use caretList function of caretEnsemble package to run multiple models. For the available models, please run names(getModelInfo())
.
Usage
trans_classifier$cal_caretList(...)
Arguments
...
parameters pass to
caretList
function of caretEnsemble package.
Returns
res_caretList_models object.
Examples
\dontrun{ t1$cal_caretList(methodList = c('rf', 'svmRadial')) }
Method clone()
The objects of this class are cloneable with this method.
Usage
trans_classifier$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Examples
## ------------------------------------------------
## Method `trans_classifier$new`
## ------------------------------------------------
data(dataset)
t1 <- trans_classifier$new(
dataset = dataset,
x.predictors = "Genus",
y.response = "Group")
## ------------------------------------------------
## Method `trans_classifier$cal_preProcess`
## ------------------------------------------------
## Not run:
t1$cal_preProcess(method = c("center", "scale", "nzv"))
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$cal_feature_sel`
## ------------------------------------------------
## Not run:
t1$cal_feature_sel(boruta.maxRuns = 300, boruta.pValue = 0.01)
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$cal_split`
## ------------------------------------------------
## Not run:
t1$cal_split(prop.train = 3/4)
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$set_trainControl`
## ------------------------------------------------
## Not run:
t1$set_trainControl(method = 'repeatedcv')
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$cal_train`
## ------------------------------------------------
## Not run:
# random forest
t1$cal_train(method = "rf")
# Support Vector Machines with Radial Basis Function Kernel
t1$cal_train(method = "svmRadial", tuneLength = 15)
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$cal_feature_imp`
## ------------------------------------------------
## Not run:
t1$cal_feature_imp()
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$plot_feature_imp`
## ------------------------------------------------
## Not run:
t1$plot_feature_imp(use_number = 1:20, coord_flip = FALSE)
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$cal_predict`
## ------------------------------------------------
## Not run:
t1$cal_predict()
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$plot_confusionMatrix`
## ------------------------------------------------
## Not run:
t1$plot_confusionMatrix()
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$cal_ROC`
## ------------------------------------------------
## Not run:
t1$cal_ROC()
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$plot_ROC`
## ------------------------------------------------
## Not run:
t1$plot_ROC(size = 1, alpha = 0.7)
## End(Not run)
## ------------------------------------------------
## Method `trans_classifier$cal_caretList`
## ------------------------------------------------
## Not run:
t1$cal_caretList(methodList = c('rf', 'svmRadial'))
## End(Not run)