R: Variable Importance Measures (VIMs)

vim {logicDT}

R Documentation

Variable Importance Measures (VIMs)

Description

Calculate variable importance measures (VIMs) based on different approaches.

Usage

vim(
  model,
  scoring_rule = "auc",
  vim_type = "logic",
  adjust = TRUE,
  interaction_order = 3,
  nodesize = NULL,
  alpha = 0.05,
  X_oob = NULL,
  y_oob = NULL,
  Z_oob = NULL,
  leaves = "4pl",
  ...
)

Arguments

`model`	The fitted `logicDT` or `logic.bagged` model
`scoring_rule`	The scoring rule for assessing the model performance. As in `logicDT`, `"auc"`, `"nce"`, `"deviance"` and `"brier"` are possible for binary outcomes. For regression, the mean squared error is used.
`vim_type`	The type of VIM to be calculated. This can either be `"logic"`, `"remove"` or `"permutation"`. See below for details.
`adjust`	Shall adjusted interaction VIMs be additionally (to the VIMs of identified terms) computed? See below for details.
`interaction_order`	If `adjust = TRUE`, up to which interaction order shall adjusted interaction VIMs be computed?
`nodesize`	If `adjust = TRUE`, how many observations need to be discriminated by an interaction in order to being considered? Similar to `conjsize` in `logicDT` and `nodesize` in `tree.control`.
`alpha`	If `adjust = TRUE`, a further adjustment can be performed trying to identify the specific conjunctions responsible for the interaction of the considered binary predictors. `alpha` specifies the significance level for statistical tests testing the alternative of a difference in the response for specific conjunctions. `alpha = 0` leads to no further adjustment. See below for details.
`X_oob`	The predictor data which should be used for calculating the VIMs. Preferably some type of validation data independent of the training data.
`y_oob`	The outcome data for computing the VIMs. Preferably some type of validation data independent of the training data.
`Z_oob`	The optional covariable data for computing the VIMs. Preferably some type of validation data independent of the training data.
`leaves`	The prediction mode if regression models (such as 4pL models) were fitted in the leaves. As in `predict.logicDT`, `"4pl"` and `"constant"` are the possible settings.
`...`	Parameters passed to the different VIM type functions. For `vim_type = "logic"`, the argument `average` can be specified as `"before"` or `"after"`. For `vim_type = "permutation"`, `n.perm` can be set to the number of random permutations. For `vim_type = "remove"`, `empty.model` can be specified as either `"none"` ignoring empty models with all predictive terms removed or `"mean"` using the response mean as prediction in the case of an empty model. See below for details.

Details

Three different VIM methods are implemented:

Permutation VIMs: Random permutations of the respective identified logic terms
Removal VIMs: Removing single logic terms
Logic VIMs: Prediction with both possible outcomes of a logic term

Details on the calculation of these VIMs are given below.

By variable importance, importance of identified logic terms is meant. These terms can be single predictors or conjunctions between predictors in the spirit of this software package.

Value

A data frame with two columns:

`var`	Short descriptions of the terms for which the importance was measured. For example `-X1^X2` for `X_1^c \land X_2`.
`vim`	The actual calculated VIM values.

The rows of such a data frame are sorted decreasingly by the VIM values.

Permutation VIMs (Breiman & Cutler, 2003)

Permutation VIMs are computed by comparing the the model's performance using the original data and data with random permutations of single terms.

Removal VIMs

Removal VIMs are constructed by removing specific logic terms from the set of predictors, refitting the decision tree and comparing the performance to the original model. Thus, this approach requires that at least two terms were found by the algorithm. Therefore, no VIM will be calculated if empty.model = "none" was specified. Alternatively, empty.model = "mean" can be set to use the constant mean response model for approximating the empty model.

Logic VIMs (Lau et al., 2024)

Logic VIMs use the fact that Boolean conjunctions are Boolean variables themselves and therefore are equal to 0 or 1. To compute the VIM for a specific term, predictions are performed once for this term fixed to 0 and once for this term fixed to 1. Then, the arithmetic mean of these two (risk or regression) predictions is used for calculating the performance. This performance is then compared to the original one as in the other VIM approaches (average = "before"). Alternatively, predictions for each fixed 0-1 scenario of the considered term can be performed leading to individual performances which then are averaged and compared to the original performance (average = "after").

Validation

Validation data sets which were not used in the fitting of the model are prefered preventing an overfitting of the VIMs themselves. These should be specified by the _oob arguments, if neither bagging nor inner validation was used for fitting the model.

Bagging

For the bagging version, out-of-bag (OOB) data are naturally used for the calculation of VIMs.

VIM Adjustment for Interactions (Lau et al., 2024)

Since decision trees can naturally include interactions between single predictors (especially when strong marginal effects are present as well), logicDT models might, e.g., include the single input variables X_1 and X_2 but not their interaction X_1 \land X_2 although an interaction effect is present. We, therefore, developed and implemented an adjustment approach for calculating VIMs for such unidentified interactions nonetheless. For predictors X_{i_1}, \ldots, X_{i_k} =: Z, this interaction importance is given by

\mathrm{VIM}(X_{i_1} \land \ldots \land X_{i_k}) = \mathrm{VIM}(X_{i_1}, \ldots, X_{i_k} \mid X \setminus Z) - \sum_{\lbrace j_1, \ldots, j_l \rbrace {\subset \atop \neq} \lbrace i_1, \ldots, i_k \rbrace} \mathrm{VIM}(X_{j_1} \land \ldots \land X_{j_l} \mid X \setminus Z)

and can basically be applied to all black-box models. By \mathrm{VIM}(A \mid X \setminus Z), the VIM of A considering the predictor set excluding the variables in Z is meant, i.e., the improvement of additionally considering A while regarding only the predictors in X \setminus Z. The proposed interaction VIM can be recursively calculated through

\mathrm{VIM}(X_{i_1} \land X_{i_2}) = \mathrm{VIM}(X_{i_1}, X_{i_2} \mid X \setminus Z) - \mathrm{VIM}(X_{i_1} \mid X \setminus Z) - \mathrm{VIM}(X_{i_2} \mid X \setminus Z)

for Z = X_{i_1}, X_{i_2}. This leads to the relationship

\mathrm{VIM}(X_{i_1} \land \ldots \land X_{i_k}) = \sum_{\lbrace j_1, \ldots, j_l \rbrace \subseteq \lbrace i_1, \ldots, i_k \rbrace} (-1)^{k-l} \cdot \mathrm{VIM}(X_{j_1}, \ldots, X_{j_l} \mid X \setminus Z).

Identification of Specific Conjunctions (Lau et al., 2024)

The aforementioned VIM adjustment approach only captures the importance of a general definition of interactions, i.e., it just considers the question whether some variables do interact in any way. Since logicDT is aimed at identifying specific conjunctions (and also assigns them VIMs if they were identified by logicDT), a further adjustment approach is implemented which tries to identify the specific conjunction leading to an interaction effect. The idea of this method is to consider the response for each possible scenario of the interacting variables, e.g., for X_1 \land (X_2^c \land X_3) where the second term X_2^c \land X_3 was identified by logicDT and, thus, two interacting terms are regarded, the 2^2 = 4 possible scenarios \lbrace (i, j) \mid i, j \in \lbrace 0, 1 \rbrace \rbrace are considered. For each setting, the corresponding response is compared with outcome values of the complementary set. For continuous outcomes, a two sample t-test (with Welch correction for potentially unequal variances) is performed comparing the means between these two groups. For binary outcomes, Fisher's exact test is performed testing different underlying case probabilities. If at least one test rejects the null hypothesis of equal outcomes (without adjusting for multiple testing), the combination with the lowest p-value is chosen as the explanatory term for the interaction effect. For example, if the most significant deviation results from X_1 = 0 and (X_2^c \land X_3) = 1 from the example above, the term X_1^c \land (X_2^c \land X_3) is chosen.

References

Lau, M., Schikowski, T. & Schwender, H. (2024). logicDT: A procedure for identifying response-associated interactions between binary predictors. Machine Learning 113(2):933–992. doi: 10.1007/s10994-023-06488-6
Breiman, L. (2001). Random Forests. Machine Learning 45(1):5-32. doi: 10.1023/A:1010933404324
Breiman, L. & Cutler, A. (2003). Manual on Setting Up, Using, and Understanding Random Forests V4.0. University of California, Berkeley, Department of Statistics. https://www.stat.berkeley.edu/~breiman/Using_random_forests_v4.0.pdf

[Package logicDT version 1.0.4 Index]