importance {EIX} | R Documentation |
This functions calculates a table with selected measures of importance for variables and interactions.
importance(xgb_model, data, option = "both", digits = 4)
xgb_model |
a xgboost or lightgbm model. |
data |
a data table with data used to train the model. |
option |
if "variables" then table includes only single variables, if "interactions", then only interactions if "both", then both single variable and interactions. Default "both". |
digits |
number of significant digits that shall be returned. Will be passed to the signif() functions. |
Available measures:
"sumGain" - sum of Gain value in all nodes, in which given variable occurs,
"sumCover" - sum of Cover value in all nodes, in which given variable occurs; for LightGBM models: number of observation, which pass through the node,
"mean5Gain" - mean gain from 5 occurrences of given variable with the highest gain,
"meanGain" - mean Gain value in all nodes, in which given variable occurs,
"meanCover" - mean Cover value in all nodes, in which given variable occurs; for LightGBM models: mean number of observation, which pass through the node,
"freqency" - number of occurrences in the nodes for given variable.
Additionally for table with single variables:
"meanDepth" - mean depth weighted by gain,
"numberOfRoots" - number of occurrences in the root,
"weightedRoot" - mean number of occurrences in the root, which is weighted by gain.
a data table
library("EIX") library("Matrix") sm <- sparse.model.matrix(left ~ . - 1, data = HR_data) library("xgboost") param <- list(objective = "binary:logistic", max_depth = 2) xgb_model <- xgboost(sm, params = param, label = HR_data[, left] == 1, nrounds = 25, verbose=0) imp <- importance(xgb_model, sm, option = "both") imp plot(imp, top = 10) imp <- importance(xgb_model, sm, option = "variables") imp plot(imp, top = nrow(imp)) imp <- importance(xgb_model, sm, option = "interactions") imp plot(imp, top = nrow(imp)) imp <- importance(xgb_model, sm, option = "variables") imp plot(imp, top = NULL, radar = FALSE, xmeasure = "sumCover", ymeasure = "sumGain")