R: Variance Inflation Factors

varinf {npreg}

R Documentation

Variance Inflation Factors

Description

Computes variance inflation factors for terms of a smooth model.

Usage

varinf(object, newdata = NULL)

Arguments

`object`	an object of class "sm" output by the `sm` function or an object of class "gsm" output by the `gsm` function.
`newdata`	the data used for variance inflation calculation (if `NULL` training data are used).

Details

Let \kappa_j^2 denote the VIF for the j-th model term.

Values of \kappa_j^2 close to 1 indicate no multicollinearity issues for the j-th term. Larger values of \kappa_j^2 indicate that \eta_j has more collinearity with other terms.

Thresholds of \kappa_j^2 > 5 or \kappa_j^2 > 10 are typically recommended for determining if multicollinearity is too much of an issue.

To understand these thresholds, note that

\kappa_j^2 = \frac{1}{1 - R_j^2}

where R_j^2 is the R-squared for the linear model predicting \eta_j from the remaining model terms.

Value

a named vector containing the variance inflation factors for each effect function (in object$terms).

Note

Suppose that the function can be written as

\eta = \eta_0 + \eta_1 + \eta_2 + ... + \eta_p

where \eta_0 is a constant (intercept) term, and \eta_j denotes the j-th effect function, which is assumed to have mean zero. Note that \eta_j could be a main or interaction effect function for all j = 1, ..., p.

Defining the p \times p matrix C with entries

C_{jk} = \cos(\eta_j, \eta_k)

where the cosine is defined with respect to the training data, i.e.,

\cos(\eta_j, \eta_k) = \frac{\sum_{i=1}^n \eta_j(x_i) \eta_k(x_i)}{\sqrt{\sum_{i=1}^n \eta_j^2(x_i)} \sqrt{\sum_{i=1}^n \eta_k^2(x_i)}}

The variane inflation factors are the diagonal elements of C^{-1}, i.e.,

\kappa_j^2 = C^{jj}

where \kappa_j^2 is the VIF for the j-th term, and C^{jj} denotes the j-th diagonal element of the matrix C^{-1}.

Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

References

Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer. doi:10.1007/978-1-4614-5369-7

Helwig, N. E. (2020). Multiple and Generalized Nonparametric Regression. In P. Atkinson, S. Delamont, A. Cernat, J. W. Sakshaug, & R. A. Williams (Eds.), SAGE Research Methods Foundations. doi:10.4135/9781526421036885885

Examples

##########   EXAMPLE 1   ##########
### 4 continuous predictors
### no multicollinearity

# generate data
set.seed(1)
n <- 100
fun <- function(x){
  sin(pi*x[,1]) + sin(2*pi*x[,2]) + sin(3*pi*x[,3]) + sin(4*pi*x[,4])
}
data <- as.data.frame(replicate(4, runif(n)))
colnames(data) <- c("x1v", "x2v", "x3v", "x4v")
fx <- fun(data)
y <- fx + rnorm(n)

# fit model
mod <- sm(y ~ x1v + x2v + x3v + x4v, data = data, tprk = FALSE)

# check vif
varinf(mod)


##########   EXAMPLE 2   ##########
### 4 continuous predictors
### multicollinearity

# generate data
set.seed(1)
n <- 100
fun <- function(x){
  sin(pi*x[,1]) + sin(2*pi*x[,2]) + sin(3*pi*x[,3]) + sin(3*pi*x[,4])
}
data <- as.data.frame(replicate(3, runif(n)))
data <- cbind(data, c(data[1,2], data[2:n,3]))
colnames(data) <- c("x1v", "x2v", "x3v", "x4v")
fx <- fun(data)
y <- fx + rnorm(n)

# check collinearity
cor(data)
cor(sin(3*pi*data[,3]), sin(3*pi*data[,4]))

# fit model
mod <- sm(y ~ x1v + x2v + x3v + x4v, data = data, tprk = FALSE)

# check vif
varinf(mod)

[Package npreg version 1.1.0 Index]