colindiag {metan} | R Documentation |
Collinearity Diagnostics
Description
Perform a (multi)collinearity diagnostic of a correlation matrix of predictor variables using several indicators, as shown by Olivoto et al. (2017).
Usage
colindiag(.data, ..., by = NULL, n = NULL)
Arguments
.data |
The data to be analyzed. It must be a symmetric correlation
matrix, or a data frame, possible with grouped data passed from
|
... |
Variables to use in the correlation. If |
by |
One variable (factor) to compute the function by. It is a shortcut
to |
n |
If a correlation matrix is provided, then |
Value
If .data
is a grouped data passed from dplyr::group_by()
then the results will be returned into a list-column of data frames.
-
cormat A symmetric Pearson's coefficient correlation matrix between the variables
-
corlist A hypothesis testing for each of the correlation coefficients
-
evalevet The eigenvalues with associated eigenvectors of the correlation matrix
-
VIF The Variance Inflation Factors, being the diagonal elements of the inverse of the correlation matrix.
-
CN The Condition Number of the correlation matrix, given by the ratio between the largest and smallest eigenvalue.
-
det The determinant of the correlation matrix.
-
ncorhigh Number of correlation greather than |0.8|.
-
largest_corr The largest correlation (in absolute value) observed.
-
smallest_corr The smallest correlation (in absolute value) observed.
-
weight_var The variables with largest eigenvector (largest weight) in the eigenvalue of smallest value, sorted in decreasing order.
Author(s)
Tiago Olivoto tiagoolivoto@gmail.com
References
Olivoto, T., V.Q. Souza, M. Nardino, I.R. Carvalho, M. Ferrari, A.J. Pelegrin, V.J. Szareski, and D. Schmidt. 2017. Multicollinearity in path analysis: a simple method to reduce its effects. Agron. J. 109:131-142. doi:10.2134/agronj2016.04.0196
Examples
# Using the correlation matrix
library(metan)
cor_iris <- cor(iris[,1:4])
n <- nrow(iris)
col_diag <- colindiag(cor_iris, n = n)
# Using a data frame
col_diag_gen <- data_ge2 %>%
group_by(GEN) %>%
colindiag()
# Diagnostic by levels of a factor
# For variables with "N" in variable name
col_diag_gen <- data_ge2 %>%
group_by(GEN) %>%
colindiag(contains("N"))