var_stability {nestedcv} | R Documentation |
Variable stability
Description
Uses variable importance across models trained and tested across outer CV
folds to assess stability of variable importance. For glmnet, variable
importance is measured as the absolute model coefficients, optionally scaled
as a percentage. The frequency with which each variable is selected in outer
folds as well as the final model is also returned which is helpful for sparse
models or with filters to determine how often variables end up in the model
in each fold. For glmnet, the direction of effect is taken directly from the
sign of model coefficients. For caret
models, direction of effect is not
readily available, so as a substitute, the directionality of each predictor
is determined by the function var_direction()
using the sign of a t-test
for binary classification or the sign of regression coefficient for
continuous outcomes (not available for multiclass caret models). To better
understand direction of effect of each predictor within the final model, we
recommend using SHAP values - see the vignette "Explaining nestedcv models
with Shapley values". See pred_train()
for an example.
Usage
var_stability(x, ...)
## S3 method for class 'nestcv.glmnet'
var_stability(x, percent = TRUE, level = 1, sort = TRUE, ...)
## S3 method for class 'nestcv.train'
var_stability(x, sort = TRUE, ...)
Arguments
x |
a |
... |
Optional arguments for compatibility |
percent |
Logical for |
level |
For multinomial |
sort |
Logical whether to sort variables by mean importance |
Details
Note that for caret models caret::varImp()
may require the model package to
be fully loaded in order to function. During the fitting process caret
often only loads the package by namespace.
Value
Dataframe containing mean, sd, sem of variable importance and frequency by which each variable is selected in outer folds.
See Also
cv_coef()
cv_varImp()
pred_train()