VarImp {ModTools} | R Documentation |
Variable Importance for Regression and Classification Models
Description
Variable importance is an expression of the desire to know how important a variable is within a group of predictors for a particular model. But in general it is not a well defined concept, say there is no theoretically defined variable importance metric. Nevertheless, there are some approaches that have been established in practice for some regression and classification algorithms.
The present function provides an interface for calculating variable importance for some of the models produced by FitMod
, comprising linear models, classification trees, random forests, C5 trees and neural networks. The intention here is to provide reasonably homogeneous output and plot routines.
Usage
VarImp(x, scale = FALSE, sort = TRUE, ...)
## S3 method for class 'FitMod'
VarImp(x, scale = FALSE, sort = TRUE, type=NULL, ...)
## Default S3 method:
VarImp(x, scale = FALSE, sort = TRUE, ...)
## S3 method for class 'VarImp'
plot(x, sort = TRUE, maxrows = NULL,
main = "Variable importance", ...)
## S3 method for class 'VarImp'
print(x, digits = 3, ...)
Arguments
x |
the fitted model |
scale |
logical, should the importance values be scaled to 0 and 100? |
... |
parameters to pass to the specific |
sort |
the name of the column, the importance table should be ordered after |
maxrows |
the maximum number of rows to be reported |
main |
the main title for the plot |
type |
some models have more than one type available to produce a variable importance. Linear models accept one of |
digits |
the number of digits for printing the "VarImp" table |
Details
Linear Models:
For linear models there's a fine package relaimpo available on CRAN containing several interesting approaches for quantifying the variable importance. See the original documentation.
rpart, Random Forest:
VarImp.rpart
and VarImp.randomForest
are wrappers around the importance functions from the rpart or randomForest packages, respectively.
C5.0:
C5.0 measures predictor importance by determining the
percentage of training set samples that fall into all the terminal
nodes after the split. For example, the predictor in the first split
automatically has an importance measurement of 100 percent since all
samples are affected by this split. Other predictors may be used
frequently in splits, but if the terminal nodes cover only a handful
of training set samples, the importance scores may be close to
zero. The same strategy is applied to rule-based models and boosted
versions of the model. The underlying function can also return the
number of times each predictor was involved in a split by using the
option metric="usage"
.
Neural Networks:
The method used here is "Garson weights".
SVM, GLM, Multinom:
There are no implementations for these models so far.
Value
A data frame with class c("VarImp.train", "data.frame")
for
VarImp.train
or a matrix for other models.
Author(s)
Andri Signorell <andri@signorell.net>
References
Quinlan, J. (1992). Learning with continuous classes. Proceedings of the 5th Australian Joint Conference On Artificial Intelligence, 343-348.