Tuning the principal components with GLMs {Compositional}R Documentation

Tuning the principal components with GLMs

Description

Tuning the number of principal components in the generalised linear models.

Usage

pcr.tune(y, x, nfolds = 10, maxk = 50, folds = NULL, ncores = 1,
seed = FALSE, graph = TRUE)

glmpcr.tune(y, x, nfolds = 10, maxk = 10, folds = NULL, ncores = 1,
seed = FALSE, graph = TRUE)

multinompcr.tune(y, x, nfolds = 10, maxk = 10, folds = NULL, ncores = 1,
seed = FALSE, graph = TRUE)

Arguments

y

A real valued vector for "pcr.tune". A real valued vector for the "glmpcr.tune" with either two numbers, 0 and 1 for example, for the binomial regression or with positive discrete numbers for the poisson. For the "multinompcr.tune" a vector or a factor with more than just two values. This is a multinomial regression.

x

A matrix with the predictor variables, they have to be continuous.

nfolds

The number of folds in the cross validation.

maxk

The maximum number of principal components to check.

folds

If you have the list with the folds supply it here. You can also leave it NULL and it will create folds.

ncores

The number of cores to use. If more than 1, parallel computing will take place. It is advisable to use it if you have many observations and or many variables, otherwise it will slow down th process.

seed

If seed is TRUE the results will always be the same.

graph

If graph is TRUE a plot of the performance for each fold along the values of α will appear.

Details

Cross validation is performed to select the optimal number of principal components in the GLMs or the multinomial regression. This is used by alfapcr.tune.

Value

If graph is TRUE a plot of the performance versus the number of principal components will appear. A list including:

msp

A matrix with the mean deviance of prediction or mean accuracy for every fold.

mpd

A vector with the mean deviance of prediction or mean accuracy, each value corresponds to a number of principal components.

k

The number of principal components which minimizes the deviance or maximises the accuracy.

performance

The optimal performance, MSE for the linea regression, minimum deviance for the GLMs and maximum accuracy for the multinomial regression.

runtime

The time required by the cross-validation procedure.

Author(s)

Michail Tsagris

R implementation and documentation: Michail Tsagris mtsagris@uoc.gr.

References

Aguilera A.M., Escabias M. and Valderrama M.J. (2006). Using principal components for estimating logistic regression with high-dimensional multicollinear data. Computational Statistics & Data Analysis 50(8): 1905-1924.

Jolliffe I.T. (2002). Principal Component Analysis.

See Also

pcr.tune, glm.pcr, alfa.pcr, alfapcr.tune

Examples

library(MASS)
x <- as.matrix(fgl[, 2:9])
y <- rpois(214, 10)
glmpcr.tune(y, x, nfolds = 10, maxk = 20, folds = NULL, ncores = 1)

[Package Compositional version 5.2 Index]