colldiag {VisCollin} | R Documentation |
Collinearity Diagnostics
Description
Calculates condition indexes and variance decomposition proportions in order to test for collinearity among the independent variables of a regression model and identifies the sources of collinearity if present.
Usage
colldiag(mod, scale = TRUE, center = FALSE, add.intercept = FALSE)
## S3 method for class 'colldiag'
print(x, dec.places = 3, fuzz = NULL, fuzzchar = ".", ...)
Arguments
mod |
A model object, such as computed by |
scale |
If |
center |
If TRUE, data are centered. Default is |
add.intercept |
if |
x |
A |
dec.places |
Number of decimal places to use when printing |
fuzz |
Variance decomposition proportions less than fuzz are printed as fuzzchar |
fuzzchar |
Character for small variance decomposition proportion values |
... |
arguments to be passed on to or from other methods (unused) |
Details
colldiag
is an implementation of the regression collinearity diagnostic procedures found in Belsley, Kuh, and Welsch (1980). These procedures examine the “conditioning” of the matrix of independent variables.
It computes the condition indexes of the model matrix. If the largest condition index (the condition number) is large (Belsley et al suggest 30 or higher), then there may be collinearity problems. All large condition indexes may be worth investigating.
colldiag
also provides further information that may help to identify the source of these problems,
the variance decomposition proportions associated with each condition index.
If a large condition index is associated two or more variables with large variance decomposition proportions,
these variables may be causing collinearity problems. Belsley et al suggest that a large proportion is
50 percent or more.
Note that such collinearity diagnostics are often provided by other software
for the model matrix including
the constant term for the intercept (e.g., SAS PROC REG, with the option COLLIN).
However, these are generally useless and misleading unless the intercept has some
real interpretation and the origin of the regressors is contained within the
prediction space, as explained by Fox (1997, p. 351). The default values
for scale
, center
and add.intercept
exclude the constant
term, and correspond to the SAS option COLLINNOINT.
Value
A "colldiag"
object, containing:
condindx |
A one-column matrix of condition indexes |
pi |
A square matrix of variance decomposition proportions. The rows refer to the principal component dimensions, the columns to the predictor variables. |
print.colldiag
prints the condition indexes as the first column of a table with the variance decomposition
proportions beside them. print.colldiag
has a fuzz
option to suppress printing of small numbers.
If fuzz is used, small values are replaces by a period “.”. Fuzzchar
can be used to specify an alternative character.
Note
Missing data is silently omitted in these calculations
Author(s)
John Hendrickx
Source
These functions were taken from the (now defunct) perturb
package by John Hendrickx.
He credits the Stata program coldiag
by Joseph Harkness joe.harkness@jhu.edu, Johns Hopkins University.
References
Belsley, D.A., Kuh, E. and Welsch, R. (1980). Regression Diagnostics, New York: John Wiley & Sons.
Belsley, D.A. (1991). Conditioning diagnostics, collinearity and weak data in regression. New York: John Wiley & Sons.
Fox, J. (1997). Applied Regression Analysis, Linear Models, and Related Methods. thousand Oaks, CA: Sage Publications.
Friendly, M., & Kwan, E. (2009). Where’s Waldo: Visualizing Collinearity Diagnostics. The American Statistician, 63, 56–65.
See Also
lm
, scale
, svd
,
[car]
vif
, [rms]
vif
Examples
data(cars)
cars.mod <- lm (mpg ~ cylinder + engine + horse + weight + accel + year,
data=cars)
car::vif(cars.mod)
# SAS PROC REG / COLLIN option, including the intercept
colldiag(cars.mod, add.intercept = TRUE)
# Default settings: scaled, not centered, no intercept, like SAS PROC REG / COLLINNOINT
colldiag(cars.mod)
(cd <- colldiag(cars.mod, center=TRUE))
# fuzz small values
print(cd, fuzz = 0.5)
# Biomass data
data(biomass)
biomass.mod <- lm (biomass ~ H2S + sal + Eh7 + pH + buf + P + K +
Ca + Mg + Na + Mn + Zn + Cu + NH4,
data=biomass)
car::vif(biomass.mod)
cd <- colldiag(biomass.mod, center=TRUE)
# simplified display
print(colldiag(biomass.mod, center=TRUE), fuzz=.3)
# None yet