Tuning the number of PCs in the PCR with compositional data using the alpha-transformation {Compositional}R Documentation

Tuning the number of PCs in the PCR with compositional data using the α-transformation

Description

This is a cross-validation procedure to decide on the number of principal components when using regression with compositional data (as predictor variables) using the α-transformation.

Usage

alfapcr.tune(y, x, model = "gaussian", nfolds = 10, maxk = 50, a = seq(-1, 1, by = 0.1),
folds = NULL, ncores = 1, graph = TRUE, col.nu = 15, seed = FALSE)

Arguments

y

A vector with either continuous, binary or count data.

x

A matrix with the predictor variables, the compositional data. Zero values are allowed.

model

The type of regression model to fit. The possible values are "gaussian", "binomial" and "poisson".

nfolds

The number of folds for the K-fold cross validation, set to 10 by default.

maxk

The maximum number of principal components to check.

a

A vector with a grid of values of the power transformation, it has to be between -1 and 1. If zero values are present it has to be greater than 0. If α=0 the isometric log-ratio transformation is applied.

folds

If you have the list with the folds supply it here. You can also leave it NULL and it will create folds.

ncores

How many cores to use. If you have heavy computations or do not want to wait for long time more than 1 core (if available) is suggested. It is advisable to use it if you have many observations and or many variables, otherwise it will slow down th process.

graph

If graph is TRUE (default value) a filled contour plot will appear.

col.nu

A number parameter for the filled contour plot, taken into account only if graph is TRUE.

seed

If seed is TRUE the results will always be the same.

Details

The α-transformation is applied to the compositional data first and the function "pcr.tune" or "glmpcr.tune" is called.

Value

If graph is TRUE a filled contour will appear. A list including:

mspe

The MSPE where rows correspond to the α values and the columns to the number of principal components.

best.par

The best pair of α and number of principal components.

performance

The minimum mean squared error of prediction.

runtime

The time required by the cross-validation procedure.

Author(s)

Michail Tsagris.

R implementation and documentation: Michail Tsagris mtsagris@uoc.gr.

References

Tsagris M. (2015). Regression analysis with compositional data containing zero values. Chilean Journal of Statistics, 6(2): 47-57. https://arxiv.org/pdf/1508.01913v1.pdf

Tsagris M.T., Preston S. and Wood A.T.A. (2011). A data-based power transformation for compositional data. In Proceedings of the 4th Compositional Data Analysis Workshop, Girona, Spain. https://arxiv.org/pdf/1106.1451.pdf

Jolliffe I.T. (2002). Principal Component Analysis.

See Also

alfa, profile, alfa.pcr, pcr.tune, glmpcr.tune, glm

Examples

library(MASS)
y <- as.vector(fgl[, 1])
x <- as.matrix(fgl[, 2:9])
x <- x/ rowSums(x)
mod <- alfapcr.tune(y, x, nfolds = 10, maxk = 50, a = seq(-1, 1, by = 0.1) )

[Package Compositional version 5.2 Index]