R: Variable importance.

varImp {modEvA}

R Documentation

Variable importance.

Description

This function gets, and optionally plots, variable importance for an input model object of an implemented class.

Usage

  varImp(model, imp.type = "each", relative = TRUE, reorder = TRUE,
  group.cats = FALSE, plot = TRUE, plot.type = "lollipop", error.bars = "sd",
  ylim = "auto", col = c("#4477aa", "#ee6677"), plot.points = TRUE,
  legend = TRUE, grid = TRUE, verbosity = 2, ...)

Arguments

`model`	a (binary-response) model object of class "glm" (of package stats), "gbm" (of package gbm), "randomForest" (of package randomForest), "bart" (of package dbarts), "pbart" or "lbart" (of package BART), or a list produced by function "flexBART" or "probit_flexBART" of package flexBART.
`imp.type`	character value indicating the type of variable importance to output, i.e. the metric with which importance is measured. Currently the only option is "each", to extract the measure provided within each model object. Note that this is inconsistent across model classes – see Details.
`relative`	logical value indicating whether to divide the absolute importance values by their total sum, to get a measure of relative variable importance. The default is TRUE. Applies to GLM and BART models.
`reorder`	logical value indicating whether to sort the variables in decreasing order of importance. The default is TRUE. If set to FALSE, the variables retain their input order.
`group.cats`	logical value indicating whether to aggregate all factor levels of each (one-hot encoded) categorical variable into a single variable, by summing up their proportions of branches used. Used if 'model' is of class 'bart', 'pbart' or 'lbart', in whose outputs the contributions of categorical variables are split by their factor levels. The default is FALSE. Note that this may incorrectly group variables that have the same name with a different numeric suffix (e.g. "soil_type_1", "soil_type_2"), so revise your results if you set this to TRUE.
`plot`	logical value indicating whether to produce a plot with the results. The default is TRUE.
`plot.type`	character value indicating the type of plot to produce (if plot=TRUE). Can be "`lollipop`" (the default), "`barplot`", or "`boxplot`". Note that the latter is only useful when 'model' contains several importance values per variable (i.e. for models of class "bart").
`error.bars`	character value indicating the type of error metric to compute (and plot, if plot=TRUE) if the input contains the necessary information (i.e., for Bayesian models like BART) and if the 'plot.type' is appropriate (i.e. "lollipop" or "barplot"). Can be "sd" (the default) for the standard deviation; "range" for the minimum and maximum value across the ones available; a numeric value between 0 and 1 for the corresponding confidence interval (e.g. 0.95 for 95%), computed with `quantile`; or NA for no error bars.
`ylim`	either a numeric vector of length 2 specifying the limits (minimum, maximum) for the y axis, or "auto" (the default) to fit the axis to the existing values.
`col`	character or integer vector of length 1 or 2 specifying the plotting colours (if plot=TRUE) for the variables with positive and negative effect on the response, when this info is available (e.g. for models of class "glm").
`plot.points`	logical, whether or not to add to the plot the individual importance points (rather than just the mean importance value, and the error bar if error.bars=TRUE) for each variable. By default it is TRUE (following Weissgerber et al. 2015), but it only holds for model objects that include several possible importance values per variable (i.e. BART models).
`legend`	logical, whether or not to draw a legend. Used only if plot=TRUE and if the output includes negative values (i.e., if 'model' is of class 'GLM' and has variables with positive and negative coefficients).
`grid`	logical, whether or not to add a grid to the plot. The default is TRUE.
`verbosity`	integer specifying the amount of messages to display. Defaults to the maximum implemented; lower numbers (down to 0) decrease the number of messages.
`...`	additional arguments that can be used for the plot (depending on 'plot.type'), e.g. 'main', 'cex.axis' (for `lollipop` or `boxplot`) or 'cex.names' (for `barplot`).

Details

Variable importance in a model of class "glm" obtained with the glm function can be measured by the magnitude of the absolute z-value test statistic, which is provided with summary(model). The 'varImp' function outputs the absolute z value of each variable (or, if relative=TRUE, the relative z value after dividing by the sum of absolute z values), and in the plot (by default) it uses different colours for variables with positive and negative relationships with the response.

If the input model is of class "gbm" of the gbm package, variable importance is obtained from summary.gbm(model) and divided by 100 to get the result as a proportion rather than a percentage (for consistency).

If the input model is of class "randomForest" of the randomForest package, variable importance is obtained with model$importance.

If the input model is of class "bart" of the dbarts package, or of class "pbart" or "lbart" of the BART package, or a list produced by function "probit_flexBART" of the flexBART package, variable importance is obtained as the mean (if relative=TRUE, the default) or the total number (if relative=FALSE) of regression tree splits where each variable is used. If 'error.bars' is not NA, the error is also computed according to the specified metric ("sd" or standard deviation by default).

Value

This function outputs, and optionally plots, a named numeric vector of variable importance, as measured by 'imp.type' (see Details). If the model is Bayesian (BART) and 'error.bars' is not NA, the output is a row-named data frame with the mean and the lower and upper bounds of the error bars of variable importance.

Note that variable names can currently not be provided for 'flexBART' models, as the object does not include them. The names will match c(colnames(X_cont_train), colnames(X_cat_train)) in the inputs that were provided to 'flexBART'.

Author(s)

A. Marcia Barbosa

References

Weissgerber T.L., Milic N.M., Winham S.J. & Garovic V,D, (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLOS Biol 13:e1002128. https://doi.org/10.1371/JOURNAL.PBIO.1002128

Examples

  # load sample models:
  data(rotif.mods)

  # choose a particular model to play with:
  mod <- rotif.mods$models[[1]]

  # get variable importance for this model:
  varImp(model = mod,
    cex.axis = 0.6)

  # change some parameters:
  par(mar = c(10, 4, 2, 1))
  varImp(model = mod,
    col = c("darkgreen", "orange"),
    plot.type = "barplot",
    cex.names = 0.8,
    ylim = c(0, 5),
    main = "Variable importance in \n my model")

[Package modEvA version 3.17 Index]