batchpca {onlinePCA}R Documentation

Batch PCA

Description

This function performs the PCA of a data matrix or covariance matrix, returning the specified number of principal components (eigenvectors) and eigenvalues.

Usage

batchpca(x, q, center, type = c("data","covariance"), byrow = FALSE)

Arguments

x

data or covariance matrix

q

number of requested PCs

center

optional centering vector for x

type

type of the matrix x

byrow

Are observation vectors stored in rows (TRUE) or in columns (FALSE)?

Details

The PCA is efficiently computed using the functions svds or eigs_sym of package RSpectra, depending on the argument type. An Implicitly Restarted Arnoldi Method (IRAM) is used in the former case and an Implicitly Restarted Lanczos Method (IRLM) in the latter.
The arguments center and byrow are only in effect if type is "data". In this case a scaling factor 1/\sqrt{n} (not 1/\sqrt{n-1}) is applied to x before computing its singular values and vectors, where n is the number of observation vectors stored in x.

Value

A list with components

values

the first q squared singular values of x if type="data"; the first Q eigenvalues if type="covariance".

vectors

the first q PC of x.

References

https://www.arpack.org

Examples


## Not run: 
## Simulate data
n <- 1e4
d <- 500
q <- 10
x <- matrix(runif(n*d), n, d)
x <- x %*% diag(sqrt(12*(1:d)))
# The eigenvalues of cov(x) are approximately 1, 2, ..., d
# and the corresponding eigenvectors are approximately  
# the canonical basis of R^p

## PCA computation (from fastest to slowest)
system.time(pca1 <- batchpca(scale(x,scale=FALSE), q, byrow=TRUE))
system.time(pca2 <- batchpca(cov(x), q, type="covariance"))
system.time(pca3 <- eigen(cov(x),TRUE))
system.time(pca4 <- svd(scale(x/sqrt(n-1),scale=FALSE), 0, q))
system.time(pca5 <- prcomp(x))

## End(Not run)

[Package onlinePCA version 1.3.2 Index]