vim {logicDT} | R Documentation |
Variable Importance Measures (VIMs)
Description
Calculate variable importance measures (VIMs) based on different approaches.
Usage
vim(
model,
scoring_rule = "auc",
vim_type = "logic",
adjust = TRUE,
interaction_order = 3,
nodesize = NULL,
alpha = 0.05,
X_oob = NULL,
y_oob = NULL,
Z_oob = NULL,
leaves = "4pl",
...
)
Arguments
model |
The fitted |
scoring_rule |
The scoring rule for assessing the model
performance. As in |
vim_type |
The type of VIM to be calculated. This can
either be |
adjust |
Shall adjusted interaction VIMs be additionally (to the VIMs of identified terms) computed? See below for details. |
interaction_order |
If |
nodesize |
If |
alpha |
If |
X_oob |
The predictor data which should be used for calculating the VIMs. Preferably some type of validation data independent of the training data. |
y_oob |
The outcome data for computing the VIMs. Preferably some type of validation data independent of the training data. |
Z_oob |
The optional covariable data for computing the VIMs. Preferably some type of validation data independent of the training data. |
leaves |
The prediction mode if regression models (such as 4pL models)
were fitted in the leaves. As in |
... |
Parameters passed to the different VIM type functions.
For |
Details
Three different VIM methods are implemented:
Permutation VIMs: Random permutations of the respective identified logic terms
Removal VIMs: Removing single logic terms
Logic VIMs: Prediction with both possible outcomes of a logic term
Details on the calculation of these VIMs are given below.
By variable importance, importance of identified logic terms is meant. These terms can be single predictors or conjunctions between predictors in the spirit of this software package.
Value
A data frame with two columns:
var |
Short descriptions of the terms for which the
importance was measured. For example |
vim |
The actual calculated VIM values. |
The rows of such a data frame are sorted decreasingly by the VIM values.
Permutation VIMs (Breiman & Cutler, 2003)
Permutation VIMs are computed by comparing the the model's performance using the original data and data with random permutations of single terms.
Removal VIMs
Removal VIMs are constructed by removing specific logic
terms from the set of predictors, refitting the decision
tree and comparing the performance to the original model.
Thus, this approach requires that at least two terms were
found by the algorithm. Therefore, no VIM will be
calculated if empty.model = "none"
was specified.
Alternatively, empty.model = "mean"
can be set to
use the constant mean response model for approximating
the empty model.
Logic VIMs (Lau et al., 2024)
Logic VIMs use the fact that Boolean conjunctions are
Boolean variables themselves and therefore are equal to
0 or 1. To compute the VIM for a specific term,
predictions are performed once for this term fixed to
0 and once for this term fixed to 1. Then, the arithmetic
mean of these two (risk or regression) predictions is
used for calculating the performance. This performance
is then compared to the original one as in the other
VIM approaches (average = "before"
). Alternatively,
predictions for each fixed 0-1 scenario of the considered
term can be performed leading to individual performances
which then are averaged and compared to the original
performance (average = "after"
).
Validation
Validation data sets which
were not used in the fitting of the model are prefered
preventing an overfitting of the VIMs themselves.
These should be specified by the _oob
arguments,
if neither bagging nor inner validation was used for fitting
the model.
Bagging
For the bagging version, out-of-bag (OOB) data are naturally used for the calculation of VIMs.
VIM Adjustment for Interactions (Lau et al., 2024)
Since decision trees can naturally include interactions
between single predictors (especially when strong marginal
effects are present as well), logicDT models might, e.g.,
include the single input variables X_1
and X_2
but
not their interaction X_1 \land X_2
although an interaction
effect is present. We, therefore, developed and implemented an
adjustment approach for calculating VIMs for such
unidentified interactions nonetheless.
For predictors X_{i_1}, \ldots, X_{i_k} =: Z
, this interaction
importance is given by
\mathrm{VIM}(X_{i_1} \land \ldots \land X_{i_k}) =
\mathrm{VIM}(X_{i_1}, \ldots, X_{i_k} \mid X \setminus Z) -
\sum_{\lbrace j_1, \ldots, j_l \rbrace {\subset \atop \neq}
\lbrace i_1, \ldots, i_k \rbrace}
\mathrm{VIM}(X_{j_1} \land \ldots \land X_{j_l} \mid X \setminus Z)
and can basically be applied to all black-box models.
By \mathrm{VIM}(A \mid X \setminus Z)
, the VIM of A
considering the predictor set excluding the variables in Z
is meant, i.e., the improvement of additionally considering A
while regarding only the predictors in X \setminus Z
.
The proposed interaction VIM can be recursively calculated through
\mathrm{VIM}(X_{i_1} \land X_{i_2}) =
\mathrm{VIM}(X_{i_1}, X_{i_2} \mid X \setminus Z) -
\mathrm{VIM}(X_{i_1} \mid X \setminus Z) -
\mathrm{VIM}(X_{i_2} \mid X \setminus Z)
for Z = X_{i_1}, X_{i_2}
.
This leads to the relationship
\mathrm{VIM}(X_{i_1} \land \ldots \land X_{i_k}) =
\sum_{\lbrace j_1, \ldots, j_l \rbrace \subseteq \lbrace i_1, \ldots, i_k \rbrace}
(-1)^{k-l} \cdot \mathrm{VIM}(X_{j_1}, \ldots, X_{j_l} \mid X \setminus Z).
Identification of Specific Conjunctions (Lau et al., 2024)
The aforementioned VIM adjustment approach only captures the importance
of a general definition of interactions, i.e., it just considers
the question whether some variables do interact in any way.
Since logicDT is aimed at identifying specific conjunctions (and also assigns
them VIMs if they were identified by logicDT
), a further
adjustment approach is implemented which tries to identify the specific
conjunction leading to an interaction effect.
The idea of this method is to consider the response for each possible
scenario of the interacting variables, e.g., for X_1 \land (X_2^c \land X_3)
where the second term X_2^c \land X_3
was identified by logicDT
and, thus, two interacting terms are regarded,
the 2^2 = 4
possible scenarios
\lbrace (i, j) \mid i, j \in \lbrace 0, 1 \rbrace \rbrace
are considered. For each setting, the corresponding response is compared with
outcome values of the complementary set. For continuous outcomes, a two sample
t-test (with Welch correction for potentially unequal variances) is performed
comparing the means between these two groups. For binary outcomes, Fisher's exact
test is performed testing different underlying case probabilities.
If at least one test rejects the null hypothesis of equal outcomes (without adjusting
for multiple testing), the combination with the lowest p-value is chosen as the
explanatory term for the interaction effect. For example, if the most significant
deviation results from X_1 = 0
and (X_2^c \land X_3) = 1
from the example
above, the term X_1^c \land (X_2^c \land X_3)
is chosen.
References
Lau, M., Schikowski, T. & Schwender, H. (2024). logicDT: A procedure for identifying response-associated interactions between binary predictors. Machine Learning 113(2):933–992. doi: 10.1007/s10994-023-06488-6
Breiman, L. (2001). Random Forests. Machine Learning 45(1):5-32. doi: 10.1023/A:1010933404324
Breiman, L. & Cutler, A. (2003). Manual on Setting Up, Using, and Understanding Random Forests V4.0. University of California, Berkeley, Department of Statistics. https://www.stat.berkeley.edu/~breiman/Using_random_forests_v4.0.pdf