h2_pairwise {hstats} | R Documentation |
Pairwise Interaction Strength
Description
Friedman and Popescu's statistic of pairwise interaction strength, see Details.
Use plot()
to get a barplot.
Usage
h2_pairwise(object, ...)
## Default S3 method:
h2_pairwise(object, ...)
## S3 method for class 'hstats'
h2_pairwise(
object,
normalize = TRUE,
squared = TRUE,
sort = TRUE,
zero = TRUE,
...
)
Arguments
object |
Object of class "hstats". |
... |
Currently unused. |
normalize |
Should statistics be normalized? Default is |
squared |
Should squared statistics be returned? Default is |
sort |
Should results be sorted? Default is |
zero |
Should rows with all 0 be shown? Default is |
Details
Following Friedman and Popescu (2008), if there are no interaction effects between
features x_j
and x_k
, their two-dimensional (centered) partial dependence
function F_{jk}
can be written as the sum of the (centered) univariate partial
dependencies F_j
and F_k
, i.e.,
F_{jk}(x_j, x_k) = F_j(x_j)+ F_k(x_k).
Correspondingly, Friedman and Popescu's statistic of pairwise
interaction strength between x_j
and x_k
is defined as
H_{jk}^2 = \frac{A_{jk}}{\frac{1}{n} \sum_{i = 1}^n\big[\hat F_{jk}(x_{ij}, x_{ik})\big]^2},
where
A_{jk} = \frac{1}{n} \sum_{i = 1}^n\big[\hat F_{jk}(x_{ij}, x_{ik}) -
\hat F_j(x_{ij}) - \hat F_k(x_{ik})\big]^2
(check partial_dep()
for all definitions).
Remarks:
Remarks 1 to 5 of
h2_overall()
also apply here.-
H^2_{jk} = 0
means there are no interaction effects betweenx_j
andx_k
. The larger the value, the more of the joint effect of the two features comes from the interaction. Since the denominator differs between variable pairs, unlike
H_j
, this test statistic is difficult to compare between variable pairs. If both main effects are very weak, a negligible interaction can get a highH^2_{jk}
. Therefore, Friedman and Popescu (2008) suggests to calculateH^2_{jk}
only for important variables (see "Modification" below).
Modification
To be better able to compare pairwise interaction strength across variable pairs,
and to overcome the problem mentioned in the last remark, we suggest as alternative
the unnormalized test statistic on the scale of the predictions,
i.e., \sqrt{A_{jk}}
. Set normalize = FALSE
and squared = FALSE
to obtain
this statistic.
Furthermore, instead of focusing on pairwise calculations for the most important
features, we can select features with strongest overall interactions.
Value
An object of class "hstats_matrix" containing these elements:
-
M
: Matrix of statistics (one column per prediction dimension), orNULL
. -
SE
: Matrix with standard errors ofM
, orNULL
. Multiply withsqrt(m_rep)
to get standard deviations instead. Currently, supported only forperm_importance()
. -
m_rep
: The number of repetitions behind standard errorsSE
, orNULL
. Currently, supported only forperm_importance()
. -
statistic
: Name of the function that generated the statistic. -
description
: Description of the statistic.
Methods (by class)
-
h2_pairwise(default)
: Default pairwise interaction strength. -
h2_pairwise(hstats)
: Pairwise interaction strength from "hstats" object.
References
Friedman, Jerome H., and Bogdan E. Popescu. "Predictive Learning via Rule Ensembles." The Annals of Applied Statistics 2, no. 3 (2008): 916-54.
See Also
hstats()
, h2()
, h2_overall()
, h2_threeway()
Examples
# MODEL 1: Linear regression
fit <- lm(Sepal.Length ~ . + Petal.Width:Species, data = iris)
s <- hstats(fit, X = iris[, -1])
# Proportion of joint effect coming from pairwise interaction
# (for features with strongest overall interactions)
h2_pairwise(s)
h2_pairwise(s, zero = FALSE) # Drop 0
# Absolute measure as alternative
abs_h <- h2_pairwise(s, normalize = FALSE, squared = FALSE, zero = FALSE)
abs_h
abs_h$M
# MODEL 2: Multi-response linear regression
fit <- lm(as.matrix(iris[, 1:2]) ~ Petal.Length + Petal.Width * Species, data = iris)
s <- hstats(fit, X = iris[, 3:5], verbose = FALSE)
x <- h2_pairwise(s)
plot(x)