R: Variable Importance Indices

varimp {npreg}

R Documentation

Variable Importance Indices

Description

Computes variable importance indices for terms of a smooth model.

Usage

varimp(object, newdata = NULL, combine = TRUE)

Arguments

`object`	an object of class "sm" output by the `sm` function or an object of class "gsm" output by the `gsm` function.
`newdata`	the data used for variable importance calculation (if `NULL` training data are used).
`combine`	a switch indicating if the parametric and smooth components of the importance should be combined (default) or returned separately.

Details

Suppose that the function can be written as

\eta = \eta_0 + \eta_1 + \eta_2 + ... + \eta_p

where \eta_0 is a constant (intercept) term, and \eta_j denotes the j-th effect function, which is assumed to have mean zero. Note that \eta_j could be a main or interaction effect function for all j = 1, ..., p.

The variable importance index for the j-th effect term is defined as

\pi_j = (\eta_j^\top \eta_*) / (\eta_*^\top \eta_*)

where \eta_* = \eta_1 + \eta_2 + ... + \eta_p. Note that \sum_{j = 1}^p \pi_j = 1 but there is no guarantee that \pi_j > 0.

If all \pi_j are non-negative, then \pi_j gives the proportion of the model's R-squared that can be accounted for by the j-th effect term. Thus, values of \pi_j closer to 1 indicate that \eta_j is more important, whereas values of \pi_j closer to 0 (including negative values) indicate that \eta_j is less important.

Value

If combine = TRUE, returns a named vector containing the importance indices for each effect function (in object$terms).

If combine = FALSE, returns a data frame where the first column gives the importance indices for the parametric components and the second column gives the importance indices for the smooth (nonparametric) components.

Note

When combine = FALSE, importance indices will be equal to zero for non-existent components of a model term. For example, a nominal effect does not have a parametric component, so the $p component of the importance index for a nominal effect will be zero.

Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

References

Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer. doi:10.1007/978-1-4614-5369-7

Helwig, N. E. (2020). Multiple and Generalized Nonparametric Regression. In P. Atkinson, S. Delamont, A. Cernat, J. W. Sakshaug, & R. A. Williams (Eds.), SAGE Research Methods Foundations. doi:10.4135/9781526421036885885

Examples


##########   EXAMPLE 1   ##########
### 1 continuous and 1 nominal predictor

# generate data
set.seed(1)
n <- 100
x <- seq(0, 1, length.out = n)
z <- factor(sample(letters[1:3], size = n, replace = TRUE))
fun <- function(x, z){
  mu <- c(-2, 0, 2)
  zi <- as.integer(z)
  fx <- mu[zi] + 3 * x + sin(2 * pi * x)
}
fx <- fun(x, z)
y <- fx + rnorm(n, sd = 0.5)

# define marginal knots
probs <- seq(0, 0.9, by = 0.1)
knots <- list(x = quantile(x, probs = probs),
              z = letters[1:3])

# fit correct (additive) model
sm.add <- sm(y ~ x + z, knots = knots)

# fit incorrect (interaction) model
sm.int <- sm(y ~ x * z, knots = knots)

# true importance indices
eff <- data.frame(x = 3 * x + sin(2 * pi * x), z = c(-2, 0, 2)[as.integer(z)])
eff <- scale(eff, scale = FALSE)
fstar <- rowSums(eff)
colSums(eff * fstar) / sum(fstar^2)

# estimated importance indices
varimp(sm.add)
varimp(sm.int)



##########   EXAMPLE 2   ##########
### 4 continuous predictors
### additive model

# generate data
set.seed(1)
n <- 100
fun <- function(x){
  sin(pi*x[,1]) + sin(2*pi*x[,2]) + sin(3*pi*x[,3]) + sin(4*pi*x[,4])
}
data <- as.data.frame(replicate(4, runif(n)))
colnames(data) <- c("x1v", "x2v", "x3v", "x4v")
fx <- fun(data)
y <- fx + rnorm(n)

# define ssa knot indices
knots.indx <- c(bin.sample(data$x1v, nbin = 10, index.return = TRUE)$ix,
                bin.sample(data$x2v, nbin = 10, index.return = TRUE)$ix,
                bin.sample(data$x3v, nbin = 10, index.return = TRUE)$ix,
                bin.sample(data$x4v, nbin = 10, index.return = TRUE)$ix)

# fit correct (additive) model
sm.add <- sm(y ~ x1v + x2v + x3v + x4v, data = data, knots = knots.indx)

# fit incorrect (interaction) model
sm.int <- sm(y ~ x1v * x2v + x3v + x4v, data = data, knots = knots.indx)

# true importance indices
eff <- data.frame(x1v = sin(pi*data[,1]), x2v = sin(2*pi*data[,2]),
                  x3v = sin(3*pi*data[,3]), x4v = sin(4*pi*data[,4]))
eff <- scale(eff, scale = FALSE)
fstar <- rowSums(eff)
colSums(eff * fstar) / sum(fstar^2)

# estimated importance indices
varimp(sm.add)
varimp(sm.int)

[Package npreg version 1.1.0 Index]