R: Incremental PCA with Block Update

incRpca.block {onlinePCA}

R Documentation

Incremental PCA with Block Update

Description

Sequential Karhunen-Loeve (SKL) algorithm of Levy and Lindenbaum (2000). The PCA can be updated with respect to a data matrix (not just a data vector).

Usage

incRpca.block(x, B, lambda, U, n0 = 0, f, q = length(lambda), center, byrow = FALSE)

Arguments

`x`	data matrix
`B`	block size
`lambda`	initial eigenvalues (optional)
`U`	initial eigenvectors/PCs (optional)
`n0`	initial sample size (optional)
`f`	vector of forgetting factors
`q`	number of requested PCs
`center`	centering vector for `x` (optional)
`byrow`	Are the data vectors in `x` stored in rows (TRUE) or columns (FALSE)?

Details

This incremental PCA algorithm utilizes QR factorization and RSVD. It generalizes the algorithm incRpca from vector to matrix/block updates. However, incRpca should be preferred for vector updates as it is faster.
If lambda and U are specified, they are taken as the initial PCA. Otherwise, the PCA is initialized by the SVD of the first block of data in x (B data vectors). The number n0 is the sample size before observing x (default value = 0). The number B is the size of blocks to be used in the PCA updates. Ideally, B should be a divisor of the number of data vectors in x (otherwise the last smaller block is discarded). If U is provided, then B and q default to the number of columns (=PC) of U.
The argument f determines the relative weight of current PCA and new data in each block update. Its length should be equal to the number of blocks in x, say nblock. If n0 and B are provided, then f defaults to B/(n0+(0:(nblock-1)*B)), i.e., the case where all data points have equal weights. The values in f should be in (0,1), with higher values giving more weight to new data and less to the current PCA.

Value

A list with components

`values`	first `q` eigenvalues.
`vectors`	first `q` eigenvectors/PCs.

References

Levy, A. and Lindenbaum, M. (2000). Sequential Karhunen-Loeve basis extraction and its application to images. IEEE Transactions on Image Processing.

Examples

## Simulate Brownian Motion
n <- 100 # number of sample paths
d <- 50 # number of observation points
x <- matrix(rnorm(n*d,sd=1/sqrt(d)),n,d)
x <- t(apply(x,1,cumsum)) # dim(x) = c(100,50)
q <- 10 # number of PC to compute
B <- 20 # block size
n0 <- B # initial sample size (if relevant)

## PCA without initial values
res1 <- incRpca.block(t(x), B, q=q) # data vectors in columns
res2 <- incRpca.block(x, B, q=q, byrow=TRUE) # data vectors in rows
all.equal(res1,res2) # TRUE

## PCA with initial values 
svd0 <- svd(x[1:n0,], 0, n0) # use first block for initialization 
lambda <- svd0$d[1:n0]^2/n0 # initial eigenvalues
U <- svd0$v # initial PC
res3 <- incRpca.block(x[-(1:n0),], B, lambda, U, n0, q=q, byrow=TRUE) 
# run PCA with this initialization on rest of the data 
all.equal(res1,res3) # compare with previous PCA: TRUE

## Compare with function incRpca
res4 <- list(values=lambda, vectors=U)
for (i in (n0+1):n)
	res4 <- incRpca(res4$values, res4$vectors, x[i,], i-1, q=q)
B <- 1 # vector update
res5 <- incRpca.block(x[-(1:n0),], B, lambda, U, n0, q=q, byrow=TRUE)
ind <- which(sign(res5$vectors[1,]) != sign(res4$vectors[1,]))
res5$vectors[,ind] <- - res5$vectors[,ind] # align PCs (flip orientation as needed)
all.equal(res4,res5) # TRUE

[Package onlinePCA version 1.3.2 Index]