which.Data.Corr {gamlss.foreach} | R Documentation |
Detecting Hight Pair-Wise Correlations in Data
Description
There are two function here.
The function which.Data.Corr()
is taking as an argument a data.frame
or a data matrix and it reports the pairs of variables which have higher correlation than r
.
The function which.yX.Corr()
it takes as arguments a continuous response variable, y
, and a set of continuous explanatory variables, x
, (which may include first order interactions),
and it creates a data.frame
containing all variables with a pair-wise correlation above r
. If the set of the continuous explanatory variables contains first order interactions, then by default, (hierarchical = TRUE
), the main effects of the first order interactions will be also included so hierarchy will be preserved.
Usage
which.Data.Corr(data, r = 0.9, digits=3)
which.yX.Corr(y, x, r = 0.5, plot = TRUE,
hierarchical = TRUE, print = TRUE, digits=3)
Arguments
data |
A |
r |
a correlation values (acting as a lower limit) |
y |
the response variable (continuous) |
x |
the (continuous) explanatory variables |
plot |
whether to plot the results or not |
print |
whether to print the dim of the new matrix or not |
hierarchical |
This is designed for make sure that if first order interactions are included in the list the main effects will be also included |
digits |
the number of digits to print. |
Value
The function which.Data.Corr()
creates a matrix with three columns. The first two columns contain the names of the variables having pair-wise correlation higher than r
and the third column show their correlation.
The function which.yX.Corr()
creats a design matrix which containts variables which have
Author(s)
Mikis Stasinopoulos d.stasinopoulos@londonmet.ac.uk, Bob Rigby and Fernada De Bastiani.
References
Bjorn-Helge Mevik, Ron Wehrens and Kristian Hovde Liland (2019). pls: Partial Least Squares and Principal Component Regression. R package version 2.7-2. https://CRAN.R-project.org/package=pls
Rigby, R. A. and Stasinopoulos D. M. (2005). Generalized additive models for location, scale and shape, (with discussion), Appl. Statist., 54, part 3, pp 507-554.
Rigby, R. A., Stasinopoulos, D. M., Heller, G. Z., and De Bastiani, F. (2019) Distributions for modeling location, scale, and shape: Using GAMLSS in R, Chapman and Hall/CRC, doi:10.1201/9780429298547. An older version can be found in https://www.gamlss.com/.
Stasinopoulos D. M. Rigby R.A. (2007) Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, Vol. 23, Issue 7, Dec 2007, doi:10.18637/jss.v023.i07.
Stasinopoulos D. M., Rigby R.A., Heller G., Voudouris V., and De Bastiani F., (2017) Flexible Regression and Smoothing: Using GAMLSS in R, Chapman and Hall/CRC. doi:10.1201/b21973
Stasinopoulos, M. D., Rigby, R. A., and De Bastiani F., (2018) GAMLSS: a distributional regression approach, Statistical Modelling, Vol. 18, pp, 248-273, SAGE Publications Sage India: New Delhi, India. doi:10.1177/1471082X18759144
Stasinopoulos, M. D., Rigby, R. A., Georgikopoulos N., and De Bastiani F., (2021) Principal component regression in GAMLSS applied to Greek-German government bond yield spreads, Statistical Modelling doi:10.1177/1471082X211022980.
(see also https://www.gamlss.com/). .
See Also
Examples
data(oil, package="gamlss.data")
dim(oil)
# which variables are highly correlated?
CC<- which.Data.Corr(oil, r=0.999)
head(CC)
# 6 of them
# get the explanatory variables
form1 <- as.formula(paste("OILPRICE ~ ",
paste(names(oil)[-1],collapse='+')))
# no interactions
X <- model.matrix(form1, data=oil)[,-1]
dim(X)
sX <- which.yX.Corr(oil$OILPRICE,x=X, r=0.4)
dim(sX)
# first order interactions
form2 <- as.formula(paste("OILPRICE ~ ",
paste0(paste0("(",paste(names(oil)[-1],
collapse='+')), ")^2")))
form2
XX <- model.matrix(form2, data=oil)[,-1]
dim(XX)
which.yX.Corr(oil$OILPRICE,x=XX, r=0.4)