pc {gamlss.foreach}R Documentation

Functions to Fit Principal Component Regression in GAMLSS

Description

The functions pcr() and pc() can be use to fit principal component regression (PCR) within a GAMLSS model. They can be used as an extra additive term (together with other additive terms for example pb()) but the idea is mainly to be used on their own as a way of reducing the dimensionality of the the (scaled) x-variables. The functions can be used even when the number of the explanatory variables say p is greater than the number of observations n.

The two functions differ on the way PCR is implemented within the GAMLSS algorithm see for example Stasinopoulos et.al (2021). In the function pc() the singular value decomposition of the scaled x's is done in the beginning and different re-weighted linear models are fitted on the PC scores see algorithm 1 in Stasinopoulos et al. (2021). In the function pcr() at each iteration a new weighted PCR is performed using the function fitPCR() see algorithm 2 in Stasinopoulos et al. (2021).

The functions gamlss.pcr() and gamlss.pc() are supporting functions. The are not intended to be called directly by users. The function gamlss.pc() is using the linear model function lm() to fit the first principal components while the function codegamlss.pcr() uses fitPCR().

The function getSVD() creates a singular value decomposition of a design matrix X using the R function La.svd().

Usage

pc(x.vars = NULL, x = NULL, x.svd = NULL, df = NULL, 
   center = TRUE, scale = TRUE, tol = NULL, 
   max.number = min(p, n), k = log(n), 
   method = c( "t-values","GAIC","k-fold"))

pcr(x.vars = NULL, x = NULL, df = NULL, 
    M = min(p, n), k = log(n), 
    r = 0.2, method = c("GAIC", "t-values", "SPCR"))

gamlss.pc(x, y, w, xeval = NULL, ...)

gamlss.pcr(x, y, w, xeval = NULL, ...)

getSVD(x = NULL, nu = min(n, p), nv = min(n, p))

Arguments

x.vars

A character vector showing the names of the x-variables. The variables should exist in the original data argument declared in the gamlss() function

x

For the function pc() and getSVD() x is a design matrix of dimensions n x p contain all the explanatory variables terms.

For the function gamlss.pc(), x is a vector of zeros which curries all in information needed for the principal components fits in its attributes

x.svd

A list created by the function getSVD(). This will speed up the time of fitting, (especial for large data sets), since all the principal components calculation are done in advance. Also if all the parameters of the distribution are modelled by principal components the calculation needed to be done only once.

df

(if is not NULL) the number of principal components to be fitted. If it is NULL the number of principal components is automatically calculated using a GAIC criterion.

center

whether to center the explanatory variables with default TRUE

scale

whether to scale the explanatory variables with default TRUE

r

the cut point for correlation coefficient to be use SPCR

tol

CHECK THIS?????

max.number, M

The maximum number of principal component to be used in the fit.

method

method used for choosing the number of components

k

the penalty for GAIC

y

the iterative response variable

w

the iterative weights

xeval

used in prediction

...

for extra arguments

nu

the number of left singular vectors to be computed. This must between 0 and n = nrow(x).

nv

the number of right singular vectors to be computed. This must be between 0 and p = ncol(x).

Details

There are three different ways of declaring the list of x-variables (two for the function pcr()):

x.vars: this should be a character vector having the names of the explanatory variables. The names should be contained in the names of variables of the data argument of the function gamlss(), see example below.

x: This should be a design matrix (preferable unscaled since this could create problems when try to predict), see examples.

x.svd: This should be a list created by the function getSVD() which is used as an argument a design matrix, see examples.

Value

For the function pc() returns an object pc with elements "coef", "beta", "pc", "edf", "AIC". The object pc has methods plot(), coef() and print().

For the function pcr() returns an object PCR see for the help for function fitPCR.

Note

Do not forget to use registerDoParallel(cores = NUMBER) or cl <- makeCluster(NUMBER) and registerDoParallel(cl) before calling the function pc() without specifying the degrees of freedom. Use closeAllConnections() after the fits to close the connections. The NUMBER depends on the machine used.

Author(s)

Mikis Stasinopoulos d.stasinopoulos@londonmet.ac.uk, Bob Rigby

References

Bjorn-Helge Mevik, Ron Wehrens and Kristian Hovde Liland (2019). pls: Partial Least Squares and Principal Component Regression. R package version 2.7-2. https://CRAN.R-project.org/package=pls

Rigby, R. A. and Stasinopoulos D. M. (2005). Generalized additive models for location, scale and shape, (with discussion), Appl. Statist., 54, part 3, pp 507-554.

Rigby, R. A., Stasinopoulos, D. M., Heller, G. Z., and De Bastiani, F. (2019) Distributions for modeling location, scale, and shape: Using GAMLSS in R, Chapman and Hall/CRC, doi:10.1201/9780429298547. An older version can be found in https://www.gamlss.com/.

Stasinopoulos D. M. Rigby R.A. (2007) Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, Vol. 23, Issue 7, Dec 2007, doi:10.18637/jss.v023.i07.

Stasinopoulos D. M., Rigby R.A., Heller G., Voudouris V., and De Bastiani F., (2017) Flexible Regression and Smoothing: Using GAMLSS in R, Chapman and Hall/CRC. doi:10.1201/b21973

Stasinopoulos, M. D., Rigby, R. A., and De Bastiani F., (2018) GAMLSS: a distributional regression approach, Statistical Modelling, Vol. 18, pp, 248-273, SAGE Publications Sage India: New Delhi, India. doi:10.1177/1471082X18759144

Stasinopoulos, M. D., Rigby, R. A., Georgikopoulos N., and De Bastiani F., (2021) Principal component regression in GAMLSS applied to Greek-German government bond yield spreads, Statistical Modelling doi:10.1177/1471082X211022980.

(see also https://www.gamlss.com/).

See Also

centiles.boot, fitRolling

Examples

# the pc() function
# fitting the same model using different arguments
# using x.vars
p1 <- gamlss(y~pc(x.vars=c("x1","x2","x3","x4","x5","x6")), data=usair)
registerDoParallel(cores = 2)
t1 <- gamlss(y~pcr(x.vars=c("x1","x2","x3","x4","x5","x6")), data=usair)
# using x
X <- model.matrix(~x1+x2+x3+x4+x5+x6, data=usair)[,-1]
p2 <- gamlss(y~pc(x=X), data=usair)
t2 <- gamlss(y~pcr(x=X), data=usair)
# using x.svd
svdX <- getSVD(X)
p3 <- gamlss(y~pc(x.svd=svdX), data=usair)
# selecting the componets 
p3 <- gamlss(y~pc(x.svd=svdX, df=3), data=usair)
stopImplicitCluster()
plot(getSmo(t2))
plot(getSmo(t2), "gaic")

[Package gamlss.foreach version 1.1-6 Index]