R: Recursive PCA using a rank 1 perturbation method

perturbationRpca {onlinePCA}

R Documentation

Recursive PCA using a rank 1 perturbation method

Description

This function recursively updates the PCA with respect to a single new data vector, using the (fast) perturbation method of Hegde et al. (2006).

Usage

perturbationRpca(lambda, U, x, n, f = 1/n, center, sort = TRUE)

Arguments

`lambda`	vector of eigenvalues.
`U`	matrix of eigenvectors (PC) stored in columns.
`x`	new data vector.
`n`	sample size before observing `x`.
`f`	forgetting factor: a number between 0 and 1.
`center`	optional centering vector for `x`.
`sort`	Should the eigenpairs be sorted?

Details

The forgetting factor f can be interpreted as the inverse of the number of observation vectors effectively used in the PCA: the "memory" of the PCA algorithm goes back 1/f observations in the past. For larger values of f, the PCA update gives more relative weight to the new data x and less to the current PCA (lambda,U). For nonstationary processes, f should be closer to 1.
Only one of the arguments n and f needs being specified. If it is n, then f is set to 1/n by default (usual PCA of sample covariance matrix where all data points have equal weight). If f is specified, its value overrides any eventual specification of n.
If sort is TRUE, the updated eigenpairs are sorted by decreasing eigenvalue. Otherwise, they are not sorted.

Value

A list with components

`values`	updated eigenvalues.
`vectors`	updated eigenvectors.

Note

This perturbation method is based on large sample approximations. It tends to be highly inaccurate for small/medium sized samples and should not be used in this case.

References

Hegde et al. (2006) Perturbation-Based Eigenvector Updates for On-Line Principal Components Analysis and Canonical Correlation Analysis. Journal of VLSI Signal Processing.

Examples

n <- 1e3
n0 <- 5e2
d <- 10
x <- matrix(runif(n*d), n, d)
 x <- x %*% diag(sqrt(12*(1:d)))
# The eigenvalues of cov(x) are approximately equal to 1, 2, ..., d
# and the corresponding eigenvectors are approximately equal to 
# the canonical basis of R^d

## Perturbation-based recursive PCA
# Initialization: use factor 1/n0 (princomp) rather 
# than factor 1/(n0-1) (prcomp) in calculations
pca <- princomp(x[1:n0,], center=FALSE)
xbar <- pca$center
pca <- list(values=pca$sdev^2, vectors=pca$loadings) 

for (i in (n0+1):n) {
	xbar <- updateMean(xbar, x[i,], i-1)
	pca <- perturbationRpca(pca$values, pca$vectors, x[i,], 
		i-1, center=xbar) }

[Package onlinePCA version 1.3.2 Index]