Tuning the principal components with GLMs {Compositional} R Documentation

## Tuning the principal components with GLMs

### Description

Tuning the number of principal components in the generalised linear models.

### Usage

```pcr.tune(y, x, nfolds = 10, maxk = 50, folds = NULL, ncores = 1,
seed = FALSE, graph = TRUE)

glmpcr.tune(y, x, nfolds = 10, maxk = 10, folds = NULL, ncores = 1,
seed = FALSE, graph = TRUE)

multinompcr.tune(y, x, nfolds = 10, maxk = 10, folds = NULL, ncores = 1,
seed = FALSE, graph = TRUE)
```

### Arguments

 `y` A real valued vector for "pcr.tune". A real valued vector for the "glmpcr.tune" with either two numbers, 0 and 1 for example, for the binomial regression or with positive discrete numbers for the poisson. For the "multinompcr.tune" a vector or a factor with more than just two values. This is a multinomial regression. `x` A matrix with the predictor variables, they have to be continuous. `nfolds` The number of folds in the cross validation. `maxk` The maximum number of principal components to check. `folds` If you have the list with the folds supply it here. You can also leave it NULL and it will create folds. `ncores` The number of cores to use. If more than 1, parallel computing will take place. It is advisable to use it if you have many observations and or many variables, otherwise it will slow down th process. `seed` If seed is TRUE the results will always be the same. `graph` If graph is TRUE a plot of the performance for each fold along the values of α will appear.

### Details

Cross validation is performed to select the optimal number of principal components in the GLMs or the multinomial regression. This is used by `alfapcr.tune`.

### Value

If graph is TRUE a plot of the performance versus the number of principal components will appear. A list including:

 `msp` A matrix with the mean deviance of prediction or mean accuracy for every fold. `mpd` A vector with the mean deviance of prediction or mean accuracy, each value corresponds to a number of principal components. `k` The number of principal components which minimizes the deviance or maximises the accuracy. `performance` The optimal performance, MSE for the linea regression, minimum deviance for the GLMs and maximum accuracy for the multinomial regression. `runtime` The time required by the cross-validation procedure.

### Author(s)

Michail Tsagris

R implementation and documentation: Michail Tsagris mtsagris@uoc.gr.

### References

Aguilera A.M., Escabias M. and Valderrama M.J. (2006). Using principal components for estimating logistic regression with high-dimensional multicollinear data. Computational Statistics & Data Analysis 50(8): 1905-1924.

Jolliffe I.T. (2002). Principal Component Analysis.

```pcr.tune, glm.pcr, alfa.pcr, alfapcr.tune ```
```library(MASS)