tuneTrain {icardaFIGSr}R Documentation

Tuning and Training the Data

Description

tuneTrain splits the Data, it is an automatic function for tuning, training, and making predictions, it returns a list containing a model object, data frame and plot.

Usage

tuneTrain(
  data,
  y,
  p = 0.7,
  method = method,
  parallelComputing = FALSE,
  length = 10,
  control = "repeatedcv",
  number = 10,
  repeats = 10,
  process = c("center", "scale"),
  summary = multiClassSummary,
  positive,
  ...
)

Arguments

data

object of class "data.frame" with target variable and predictor variables.

y

character. Target variable.

p

numeric. Proportion of data to be used for training. Default: 0.7

method

character. Type of model to use for classification or regression.

parallelComputing

logical. indicates whether to also use the parallel processing. Default: False

length

integer. Number of values to output for each tuning parameter. If search = "random" is passed to trainControl through ..., this becomes the maximum number of tuning parameter combinations that are generated by the random search. Default: 10.

control

character. Resampling method to use. Choices include: "boot", "boot632", "optimism_boot", "boot_all", "cv", "repeatedcv", "LOOCV", "LGOCV", "none", "oob", timeslice, "adaptive_cv", "adaptive_boot", or "adaptive_LGOCV". Default: "repeatedcv". See train for specific details on the resampling methods.

number

integer. Number of cross-validation folds or number of resampling iterations. Default: 10.

repeats

integer. Number of folds for repeated k-fold cross-validation if "repeatedcv" is chosen as the resampling method in control. Default: 10.

process

character. Defines the pre-processing transformation of predictor variables to be done. Options are: "BoxCox", "YeoJohnson", "expoTrans", "center", "scale", "range", "knnImpute", "bagImpute", "medianImpute", "pca", "ica", or "spatialSign". See preProcess for specific details on each pre-processing transformation. Default: c('center', 'scale').

summary

expression. Computes performance metrics across resamples. For numeric y, the mean squared error and R-squared are calculated. For factor y, the overall accuracy and Kappa are calculated. See trainControl and defaultSummary for details on specification and summary options. Default: multiClassSummary.

positive

character. The positive class for the target variable if y is factor. Usually, it is the first level of the factor.

...

additional arguments to be passed to createDataPartition, trainControl and train functions in the package caret.

Details

Types of classification and regression models available for use with tuneTrain can be found using names(getModelInfo()). The results given depend on the type of model used.

For classification models, class probabilities and ROC curve are given in the results. For regression models, predictions and residuals versus predicted plot are given. y should be converted to either factor if performing classification or numeric if performing regression before specifying it in tuneTrain.

Value

A list object with results from tuning and training the model selected in method, together with predictions and class probabilities. The training and test data sets obtained from splitting the data are also returned.

If y is factor, class probabilities are calculated for each class. If y is numeric, predicted values are calculated.

A ROC curve is created if y is factor. Otherwise, a plot of residuals versus predicted values is created if y is numeric.

tuneTrain relies on packages caret, ggplot2 and plotROC to perform the modelling and plotting.

Author(s)

Zakaria Kehel, Bancy Ngatia, Khadija Aziz

See Also

createDataPartition, trainControl, train, predict.train, ggplot, geom_roc, calc_auc

Examples

if(interactive()){
 data(septoriaDurumWC)
 knn.mod <- tuneTrain(data = septoriaDurumWC,y = 'ST_S',method = 'knn',positive = 'R')
 
 nnet.mod <- tuneTrain(data = septoriaDurumWC,y = 'ST_S',method = 'nnet',positive = 'R')

}

[Package icardaFIGSr version 1.0.2 Index]