svyvif {svydiags} | R Documentation |
Variance inflation factors (VIF) for general linear models fitted with complex survey data
Description
Compute a VIF for fixed effects, general linear regression models fitted with data collected from one- and two-stage complex survey designs.
Usage
svyvif(mobj, X, w, stvar=NULL, clvar=NULL)
Arguments
mobj |
model object produced by |
X |
|
w |
|
stvar |
field in |
clvar |
field in |
Details
svyvif
computes variance inflation factors (VIFs) appropriate for linear models and some general linear models (GLMs) fitted from complex survey data (see Liao 2010 and Liao & Valliant 2012). A VIF measures the inflation of a slope estimate caused by nonorthogonality of the predictors over and above what the variance would be with orthogonality (Theil 1971; Belsley, Kuh, and Welsch 1980). A VIF may also be thought of as the amount that the variance of an estimated coefficient for a predictor x is inflated in a model that includes all x's compared to a model that includes only the single x. Another alternative is to use as a comparison a model that includes an intercept and the single x. Both of these VIFs are in the output.
The standard VIF equals 1/(1 - R^2_k)
where R_k
is the multiple correlation of the k^{th}
column of X
regressed on the remaining columns. The complex sample value of the VIF for a linear model consists of the standard VIF multiplied by two adjustments denoted in the output as zeta
and either varrho.m
or varrho
. The VIF for a GLM is similar (Liao 2010, chap. 5; Liao & Valliant 2024). There is no widely agreed-upon cutoff value for identifying high values of a VIF, although 10 is a common suggestion.
Value
A list with two components:
Intercept adjusted
p \times 6
data frame with columns:
svy.vif.m:
complex sample VIF where the reference model includes an intercept and a single x
reg.vif.m:
standard VIF,
1/(1 - R^2_{m(k)})
, that omits the factors,zeta
andvarrho.m
;R^2_{m(k)}
is an R-square, corrected for the mean, from a weighted least squares regression of thek^{th}
x on the other x's in the regressionzeta:
1st multiplicative adjustment to
reg.vif.m
varrho.m:
2nd multiplicative adjustment to
reg.vif.m
zeta.x.varrho.m:
product of the two adjustments to
reg.vif.m
Rsq.m:
R-square, corrected for the mean, in the regression of the
k^{th}
x on the other x's, including an intercept
No intercept
p \times 6
data frame with columns:
svy.vif:
complex sample VIF where the reference model includes a single x and excludes an intercept; this VIF is analogous to the one included in standard packages that provide VIFs for linear regressions
reg.vif:
standard VIF,
1/(1 - R^2_k)
, that omits the factors,zeta
andvarrho
;R^2_k
is an R-square, not corrected for the mean, from a weighted least squares regression of thek^{th}
x on the other x's in the regressionzeta:
1st multiplicative adjustment to
reg.vif
varrho:
2nd multiplicative adjustment to
reg.vif
zeta.x.varrho:
product of the two adjustments to
reg.vif
Rsq:
R-square, not corrected for the mean, in the regression of the
k^{th}
x on the other x's, including an intercept
Author(s)
Richard Valliant
References
Belsley, D.A., Kuh, E. and Welsch, R.E. (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: Wiley-Interscience.
Liao, D. (2010). Collinearity Diagnostics for Complex Survey Data. PhD thesis, University of Maryland. http://hdl.handle.net/1903/10881.
Liao, D, and Valliant, R. (2012). Variance inflation factors in the analysis of complex survey data. Survey Methodology, 38, 53-62.
Liao, D, and Valliant, R. (2024). Variance Inflation Factors in Generalized Linear Models with Extensions to Analysis of Survey Data. submitted.
Theil, H. (1971). Principles of Econometrics. New York: John Wiley & Sons, Inc.
Lumley, T. (2010). Complex Surveys. New York: John Wiley & Sons.
Lumley, T. (2023). survey: analysis of complex survey samples. R package version 4.4.
See Also
Examples
require(survey)
data(nhanes2007)
X1 <- nhanes2007[order(nhanes2007$SDMVSTRA, nhanes2007$SDMVPSU),]
# eliminate cases with missing values
delete <- which(complete.cases(X1)==FALSE)
X2 <- X1[-delete,]
nhanes.dsgn <- svydesign(ids = ~SDMVPSU,
strata = ~SDMVSTRA,
weights = ~WTDRD1, nest=TRUE, data=X2)
# linear model
m1 <- svyglm(BMXWT ~ RIDAGEYR + as.factor(RIDRETH1) + DR1TKCAL
+ DR1TTFAT + DR1TMFAT, design=nhanes.dsgn)
summary(m1)
# construct X matrix using model.matrix from stats package
X3 <- model.matrix(~ RIDAGEYR + as.factor(RIDRETH1) + DR1TKCAL + DR1TTFAT + DR1TMFAT,
data = data.frame(X2))
# remove col of 1's for intercept with X3[,-1]
svyvif(mobj=m1, X=X3[,-1], w = X2$WTDRD1, stvar=NULL, clvar=NULL)
# Logistic model
X2$obese <- X2$BMXBMI >= 30
nhanes.dsgn <- svydesign(ids = ~SDMVPSU,
strata = ~SDMVSTRA,
weights = ~WTDRD1, nest=TRUE, data=X2)
m2 <- svyglm(obese ~ RIDAGEYR + as.factor(RIDRETH1) + DR1TKCAL
+ DR1TTFAT + DR1TMFAT, design=nhanes.dsgn, family="quasibinomial")
summary(m2)
svyvif(mobj=m2, X=X3[,-1], w = X2$WTDRD1, stvar = "SDMVSTRA", clvar = "SDMVPSU")