R: Variance inflation factors (VIF) for general linear models...

svyvif {svydiags}

R Documentation

Variance inflation factors (VIF) for general linear models fitted with complex survey data

Description

Compute a VIF for fixed effects, general linear regression models fitted with data collected from one- and two-stage complex survey designs.

Usage

svyvif(mobj, X, w, stvar=NULL, clvar=NULL)

Arguments

`mobj`	model object produced by `svyglm`. The following families of models are allowed: binomial, gaussian, poisson, quasibinomial, and quasipoisson. Other families allowed by `svyglm` will produce an error in `svyvif`.
`X`	`n \times p` matrix of real-valued covariates used in fitting the regression; `n` = number of observations, `p` = number of covariates in model, excluding the intercept. A column of 1's for an intercept should not be included. `X` should not contain columns for the strata and cluster identifiers (unless those variables are part of the model). No missing values are allowed.
`w`	`n`-vector of survey weights used in fitting the model. No missing values are allowed.
`stvar`	field in `mobj` that contains the stratum variable in the complex sample design; use `stvar = NULL` if there are no strata
`clvar`	field in `mobj` that contains the cluster variable in the complex sample design; use `clvar = NULL` if there are no clusters

Details

svyvif computes variance inflation factors (VIFs) appropriate for linear models and some general linear models (GLMs) fitted from complex survey data (see Liao 2010 and Liao & Valliant 2012). A VIF measures the inflation of a slope estimate caused by nonorthogonality of the predictors over and above what the variance would be with orthogonality (Theil 1971; Belsley, Kuh, and Welsch 1980). A VIF may also be thought of as the amount that the variance of an estimated coefficient for a predictor x is inflated in a model that includes all x's compared to a model that includes only the single x. Another alternative is to use as a comparison a model that includes an intercept and the single x. Both of these VIFs are in the output.

The standard VIF equals 1/(1 - R^2_k) where R_k is the multiple correlation of the k^{th} column of X regressed on the remaining columns. The complex sample value of the VIF for a linear model consists of the standard VIF multiplied by two adjustments denoted in the output as zeta and either varrho.m or varrho. The VIF for a GLM is similar (Liao 2010, chap. 5; Liao & Valliant 2024). There is no widely agreed-upon cutoff value for identifying high values of a VIF, although 10 is a common suggestion.

Value

A list with two components:

Intercept adjusted: p \times 6 data frame with columns:

svy.vif.m:: complex sample VIF where the reference model includes an intercept and a single x
reg.vif.m:: standard VIF, 1/(1 - R^2_{m(k)}), that omits the factors, zeta and varrho.m; R^2_{m(k)} is an R-square, corrected for the mean, from a weighted least squares regression of the k^{th} x on the other x's in the regression
zeta:: 1st multiplicative adjustment to reg.vif.m
varrho.m:: 2nd multiplicative adjustment to reg.vif.m
zeta.x.varrho.m:: product of the two adjustments to reg.vif.m
Rsq.m:: R-square, corrected for the mean, in the regression of the k^{th} x on the other x's, including an intercept

No intercept: p \times 6 data frame with columns:

svy.vif:: complex sample VIF where the reference model includes a single x and excludes an intercept; this VIF is analogous to the one included in standard packages that provide VIFs for linear regressions
reg.vif:: standard VIF, 1/(1 - R^2_k), that omits the factors, zeta and varrho; R^2_k is an R-square, not corrected for the mean, from a weighted least squares regression of the k^{th} x on the other x's in the regression
zeta:: 1st multiplicative adjustment to reg.vif
varrho:: 2nd multiplicative adjustment to reg.vif
zeta.x.varrho:: product of the two adjustments to reg.vif
Rsq:: R-square, not corrected for the mean, in the regression of the k^{th} x on the other x's, including an intercept

Author(s)

Richard Valliant

References

Belsley, D.A., Kuh, E. and Welsch, R.E. (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: Wiley-Interscience.

Liao, D. (2010). Collinearity Diagnostics for Complex Survey Data. PhD thesis, University of Maryland. http://hdl.handle.net/1903/10881.

Liao, D, and Valliant, R. (2012). Variance inflation factors in the analysis of complex survey data. Survey Methodology, 38, 53-62.

Liao, D, and Valliant, R. (2024). Variance Inflation Factors in Generalized Linear Models with Extensions to Analysis of Survey Data. submitted.

Theil, H. (1971). Principles of Econometrics. New York: John Wiley & Sons, Inc.

Lumley, T. (2010). Complex Surveys. New York: John Wiley & Sons.

Lumley, T. (2023). survey: analysis of complex survey samples. R package version 4.4.

Examples

require(survey)
data(nhanes2007)
X1 <- nhanes2007[order(nhanes2007$SDMVSTRA, nhanes2007$SDMVPSU),]
    # eliminate cases with missing values
delete <- which(complete.cases(X1)==FALSE)
X2 <- X1[-delete,]
nhanes.dsgn <- svydesign(ids = ~SDMVPSU,
                         strata = ~SDMVSTRA,
                         weights = ~WTDRD1, nest=TRUE, data=X2)
    # linear model
m1 <- svyglm(BMXWT ~ RIDAGEYR + as.factor(RIDRETH1) + DR1TKCAL
            + DR1TTFAT + DR1TMFAT, design=nhanes.dsgn)
summary(m1)
    # construct X matrix using model.matrix from stats package
X3 <- model.matrix(~ RIDAGEYR + as.factor(RIDRETH1) + DR1TKCAL + DR1TTFAT + DR1TMFAT,
        data = data.frame(X2))
    # remove col of 1's for intercept with X3[,-1]
svyvif(mobj=m1, X=X3[,-1], w = X2$WTDRD1, stvar=NULL, clvar=NULL)

    # Logistic model
X2$obese <- X2$BMXBMI >= 30
nhanes.dsgn <- svydesign(ids = ~SDMVPSU,
                         strata = ~SDMVSTRA,
                         weights = ~WTDRD1, nest=TRUE, data=X2)
m2 <- svyglm(obese ~ RIDAGEYR + as.factor(RIDRETH1) + DR1TKCAL
             + DR1TTFAT + DR1TMFAT, design=nhanes.dsgn, family="quasibinomial")
summary(m2)
svyvif(mobj=m2, X=X3[,-1], w = X2$WTDRD1, stvar = "SDMVSTRA", clvar = "SDMVPSU")

[Package svydiags version 0.6 Index]