varinf {npreg}R Documentation

Variance Inflation Factors

Description

Computes variance inflation factors for terms of a smooth model.

Usage

varinf(object, newdata = NULL)

Arguments

object

an object of class "sm" output by the sm function or an object of class "gsm" output by the gsm function.

newdata

the data used for variance inflation calculation (if NULL training data are used).

Details

Let \kappa_j^2 denote the VIF for the j-th model term.

Values of \kappa_j^2 close to 1 indicate no multicollinearity issues for the j-th term. Larger values of \kappa_j^2 indicate that \eta_j has more collinearity with other terms.

Thresholds of \kappa_j^2 > 5 or \kappa_j^2 > 10 are typically recommended for determining if multicollinearity is too much of an issue.

To understand these thresholds, note that

\kappa_j^2 = \frac{1}{1 - R_j^2}

where R_j^2 is the R-squared for the linear model predicting \eta_j from the remaining model terms.

Value

a named vector containing the variance inflation factors for each effect function (in object$terms).

Note

Suppose that the function can be written as

\eta = \eta_0 + \eta_1 + \eta_2 + ... + \eta_p

where \eta_0 is a constant (intercept) term, and \eta_j denotes the j-th effect function, which is assumed to have mean zero. Note that \eta_j could be a main or interaction effect function for all j = 1, ..., p.

Defining the p \times p matrix C with entries

C_{jk} = \cos(\eta_j, \eta_k)

where the cosine is defined with respect to the training data, i.e.,

\cos(\eta_j, \eta_k) = \frac{\sum_{i=1}^n \eta_j(x_i) \eta_k(x_i)}{\sqrt{\sum_{i=1}^n \eta_j^2(x_i)} \sqrt{\sum_{i=1}^n \eta_k^2(x_i)}}

The variane inflation factors are the diagonal elements of C^{-1}, i.e.,

\kappa_j^2 = C^{jj}

where \kappa_j^2 is the VIF for the j-th term, and C^{jj} denotes the j-th diagonal element of the matrix C^{-1}.

Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

References

Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer. doi:10.1007/978-1-4614-5369-7

Helwig, N. E. (2020). Multiple and Generalized Nonparametric Regression. In P. Atkinson, S. Delamont, A. Cernat, J. W. Sakshaug, & R. A. Williams (Eds.), SAGE Research Methods Foundations. doi:10.4135/9781526421036885885

See Also

See summary.sm for more thorough summaries of smooth models.

See summary.gsm for more thorough summaries of generalized smooth models.

Examples

##########   EXAMPLE 1   ##########
### 4 continuous predictors
### no multicollinearity

# generate data
set.seed(1)
n <- 100
fun <- function(x){
  sin(pi*x[,1]) + sin(2*pi*x[,2]) + sin(3*pi*x[,3]) + sin(4*pi*x[,4])
}
data <- as.data.frame(replicate(4, runif(n)))
colnames(data) <- c("x1v", "x2v", "x3v", "x4v")
fx <- fun(data)
y <- fx + rnorm(n)

# fit model
mod <- sm(y ~ x1v + x2v + x3v + x4v, data = data, tprk = FALSE)

# check vif
varinf(mod)


##########   EXAMPLE 2   ##########
### 4 continuous predictors
### multicollinearity

# generate data
set.seed(1)
n <- 100
fun <- function(x){
  sin(pi*x[,1]) + sin(2*pi*x[,2]) + sin(3*pi*x[,3]) + sin(3*pi*x[,4])
}
data <- as.data.frame(replicate(3, runif(n)))
data <- cbind(data, c(data[1,2], data[2:n,3]))
colnames(data) <- c("x1v", "x2v", "x3v", "x4v")
fx <- fun(data)
y <- fx + rnorm(n)

# check collinearity
cor(data)
cor(sin(3*pi*data[,3]), sin(3*pi*data[,4]))

# fit model
mod <- sm(y ~ x1v + x2v + x3v + x4v, data = data, tprk = FALSE)

# check vif
varinf(mod)


[Package npreg version 1.1.0 Index]