xgb.importance {xgboost}R Documentation

Importance of features in a model.

Description

Creates a data.table of feature importances in a model.

Usage

xgb.importance(
  feature_names = NULL,
  model = NULL,
  trees = NULL,
  data = NULL,
  label = NULL,
  target = NULL
)

Arguments

feature_names

character vector of feature names. If the model already contains feature names, those would be used when feature_names=NULL (default value). Non-null feature_names could be provided to override those in the model.

model

object of class xgb.Booster.

trees

(only for the gbtree booster) an integer vector of tree indices that should be included into the importance calculation. If set to NULL, all trees of the model are parsed. It could be useful, e.g., in multiclass classification to get feature importances for each class separately. IMPORTANT: the tree index in xgboost models is zero-based (e.g., use trees = 0:4 for first 5 trees).

data

deprecated.

label

deprecated.

target

deprecated.

Details

This function works for both linear and tree models.

For linear models, the importance is the absolute magnitude of linear coefficients. For that reason, in order to obtain a meaningful ranking by importance for a linear model, the features need to be on the same scale (which you also would want to do when using either L1 or L2 regularization).

Value

For a tree model, a data.table with the following columns:

A linear model's importance data.table has the following columns:

If feature_names is not provided and model doesn't have feature_names, index of the features will be used instead. Because the index is extracted from the model dump (based on C++ code), it starts at 0 (as in C/C++ or Python) instead of 1 (usual in R).

Examples


# binomial classification using gbtree:
data(agaricus.train, package='xgboost')
bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max_depth = 2,
               eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
xgb.importance(model = bst)

# binomial classification using gblinear:
bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, booster = "gblinear",
               eta = 0.3, nthread = 1, nrounds = 20, objective = "binary:logistic")
xgb.importance(model = bst)

# multiclass classification using gbtree:
nclass <- 3
nrounds <- 10
mbst <- xgboost(data = as.matrix(iris[, -5]), label = as.numeric(iris$Species) - 1,
               max_depth = 3, eta = 0.2, nthread = 2, nrounds = nrounds,
               objective = "multi:softprob", num_class = nclass)
# all classes clumped together:
xgb.importance(model = mbst)
# inspect importances separately for each class:
xgb.importance(model = mbst, trees = seq(from=0, by=nclass, length.out=nrounds))
xgb.importance(model = mbst, trees = seq(from=1, by=nclass, length.out=nrounds))
xgb.importance(model = mbst, trees = seq(from=2, by=nclass, length.out=nrounds))

# multiclass classification using gblinear:
mbst <- xgboost(data = scale(as.matrix(iris[, -5])), label = as.numeric(iris$Species) - 1,
               booster = "gblinear", eta = 0.2, nthread = 1, nrounds = 15,
               objective = "multi:softprob", num_class = nclass)
xgb.importance(model = mbst)


[Package xgboost version 1.7.8.1 Index]