R: Variable Importance and Predictions

varimpPred {icardaFIGSr}

R Documentation

Variable Importance and Predictions

Description

varimpPred calculates Variable Importance and makes predictions, it returns a list containing a data frame of variable importance scores, predictions or class probabilities, and corresponding plots.

Usage

varimpPred(
  newdata,
  y,
  positive,
  model,
  scale = FALSE,
  auc = FALSE,
  predict = FALSE,
  ...
)

Arguments

`newdata`	object of class "data.frame" having test data.
`y`	character. Target variable.
`positive`	character. The positive class for the target variable if y is factor. Usually, it is the first level of the factor.
`model`	expression. The model object returned after training a model on training data.
`scale`	boolean. If `TRUE`, scales the variable importance values to between 0-100. Default: FALSE.
`auc`	boolean. If `TRUE`, calculates the area under the ROC curve and returns the value. Default: FALSE.
`predict`	boolean. If `TRUE`, calculates class probabilities and returns them as a data frame. Default: FALSE
`...`	additional arguments to be passed to `varImp` function in the package `caret`.

Details

The importance measure for each variable is calculated based on the type of model.

For example for linear models, the absolute value of the t-statistic of each parameter is used in the importance calculation.

For classification models, with the exception of classification trees, bagged trees and boosted trees, a variable importance score is calculated for each class. See varImp for details on model-specific metrics.

varimpPred can be used to obtain either variable importance metrics, predictions, class probabilities, or a combination of these.

For classification models with predict = TRUE, class probabilities and ROC curve are given in the results.

For regression models with predict = TRUE, predictions and residuals versus predicted plot are given.

Value

A list object with importance measures for variables in newdata, predictions for regression models, class probabilities for classification models, and corresponding plots.

newdata should be either the test data that remains after splitting whole data into training and test sets, or a new data set different from the one used to train the model.

If y is factor, class probabilities are calculated for each class. If y is numeric, predicted values are calculated.

A ROC curve is created if predict = TRUE and y is factor. Otherwise, a plot of residuals versus predicted values is created if y is numeric.

varimpPred relies on packages caret, ggplot2 and plotROC to perform the calculations and plotting.

Author(s)

Zakaria Kehel, Bancy Ngatia, Khadija Aziz, Zainab Azough

Examples

if(interactive()){
 # Calculate variable importance for classification model
 data("septoriaDurumWC")
 knn.mod <- tuneTrain(data = septoriaDurumWC,y = 'ST_S',method = 'knn')
 testdata <- knn.mod$`Test Data`
 knn.varimp<- varimpPred(newdata = testdata, y='ST_S', positive = 'R', model = knn.mod$Model)
 knn.varimp
 
 # Calculate variable importance and obtain class probabilities
 data("septoriaDurumWC")
 svm.mod <- tuneTrain(data = septoriaDurumWC, y = 'ST_S',method = 'svmLinear2',
                   predict = TRUE, positive = 'R',summary = twoClassSummary)
 testdata <- svm.mod$`Test Data`
 svm.varimp <- varimpPred(newdata = testdata, y = 'ST_S',
                          positive = 'R', model = svm.mod$Model,
                          ROC = TRUE, predict = TRUE)
 svm.varimp
 # Obtain variable importance plot for only first 20 variables
 # with highest measure
 svm.varimp <- varimpPred(newdata = testdata, y = 'ST_S',
                          positive = 'R', model = svm.mod$Model,
                          ROC = TRUE, predict = TRUE, top = 20)
 svm.varimp
 }

[Package icardaFIGSr version 1.0.2 Index]