which.Data.Corr {gamlss.foreach}R Documentation

Detecting Hight Pair-Wise Correlations in Data

Description

There are two function here.

The function which.Data.Corr() is taking as an argument a data.frame or a data matrix and it reports the pairs of variables which have higher correlation than r.

The function which.yX.Corr() it takes as arguments a continuous response variable, y, and a set of continuous explanatory variables, x, (which may include first order interactions), and it creates a data.frame containing all variables with a pair-wise correlation above r. If the set of the continuous explanatory variables contains first order interactions, then by default, (hierarchical = TRUE), the main effects of the first order interactions will be also included so hierarchy will be preserved.

Usage

which.Data.Corr(data, r = 0.9, digits=3)
which.yX.Corr(y, x, r = 0.5, plot = TRUE, 
              hierarchical = TRUE, print = TRUE, digits=3)

Arguments

data

A data.frame or a matrix

r

a correlation values (acting as a lower limit)

y

the response variable (continuous)

x

the (continuous) explanatory variables

plot

whether to plot the results or not

print

whether to print the dim of the new matrix or not

hierarchical

This is designed for make sure that if first order interactions are included in the list the main effects will be also included

digits

the number of digits to print.

Value

The function which.Data.Corr() creates a matrix with three columns. The first two columns contain the names of the variables having pair-wise correlation higher than r and the third column show their correlation.

The function which.yX.Corr() creats a design matrix which containts variables which have

Author(s)

Mikis Stasinopoulos d.stasinopoulos@londonmet.ac.uk, Bob Rigby and Fernada De Bastiani.

References

Bjorn-Helge Mevik, Ron Wehrens and Kristian Hovde Liland (2019). pls: Partial Least Squares and Principal Component Regression. R package version 2.7-2. https://CRAN.R-project.org/package=pls

Rigby, R. A. and Stasinopoulos D. M. (2005). Generalized additive models for location, scale and shape, (with discussion), Appl. Statist., 54, part 3, pp 507-554.

Rigby, R. A., Stasinopoulos, D. M., Heller, G. Z., and De Bastiani, F. (2019) Distributions for modeling location, scale, and shape: Using GAMLSS in R, Chapman and Hall/CRC, doi:10.1201/9780429298547. An older version can be found in https://www.gamlss.com/.

Stasinopoulos D. M. Rigby R.A. (2007) Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, Vol. 23, Issue 7, Dec 2007, doi:10.18637/jss.v023.i07.

Stasinopoulos D. M., Rigby R.A., Heller G., Voudouris V., and De Bastiani F., (2017) Flexible Regression and Smoothing: Using GAMLSS in R, Chapman and Hall/CRC. doi:10.1201/b21973

Stasinopoulos, M. D., Rigby, R. A., and De Bastiani F., (2018) GAMLSS: a distributional regression approach, Statistical Modelling, Vol. 18, pp, 248-273, SAGE Publications Sage India: New Delhi, India. doi:10.1177/1471082X18759144

Stasinopoulos, M. D., Rigby, R. A., Georgikopoulos N., and De Bastiani F., (2021) Principal component regression in GAMLSS applied to Greek-German government bond yield spreads, Statistical Modelling doi:10.1177/1471082X211022980.

(see also https://www.gamlss.com/). .

See Also

pc

Examples

data(oil, package="gamlss.data")
dim(oil)
# which variables are highly correlated?
CC<- which.Data.Corr(oil, r=0.999)
head(CC)
# 6 of them
# get the explanatory variables
form1 <- as.formula(paste("OILPRICE ~ ",
          paste(names(oil)[-1],collapse='+'))) 
# no interactions
X <- model.matrix(form1, data=oil)[,-1]
dim(X)
sX <- which.yX.Corr(oil$OILPRICE,x=X, r=0.4)
dim(sX)
# first order interactions
form2 <- as.formula(paste("OILPRICE ~ ",
        paste0(paste0("(",paste(names(oil)[-1], 
        collapse='+')), ")^2"))) 
form2
XX <- model.matrix(form2, data=oil)[,-1]
dim(XX)
which.yX.Corr(oil$OILPRICE,x=XX, r=0.4)

[Package gamlss.foreach version 1.1-6 Index]