perturbationRpca {onlinePCA} | R Documentation |
Recursive PCA using a rank 1 perturbation method
Description
This function recursively updates the PCA with respect to a single new data vector, using the (fast) perturbation method of Hegde et al. (2006).
Usage
perturbationRpca(lambda, U, x, n, f = 1/n, center, sort = TRUE)
Arguments
lambda |
vector of eigenvalues. |
U |
matrix of eigenvectors (PC) stored in columns. |
x |
new data vector. |
n |
sample size before observing |
f |
forgetting factor: a number between 0 and 1. |
center |
optional centering vector for |
sort |
Should the eigenpairs be sorted? |
Details
The forgetting factor f
can be interpreted as the inverse of the number of observation vectors effectively used in the PCA: the "memory" of the PCA algorithm goes back 1/f
observations in the past. For larger values of f
, the PCA update gives more relative weight to the new data x
and less to the current PCA (lambda,U
). For nonstationary processes, f
should be closer to 1.
Only one of the arguments n
and f
needs being specified. If it is n
, then f
is set to 1/n
by default (usual PCA of sample covariance matrix where all data points have equal weight). If f
is specified, its value overrides any eventual specification of n
.
If sort
is TRUE, the updated eigenpairs are sorted by decreasing eigenvalue. Otherwise, they are not sorted.
Value
A list with components
values |
updated eigenvalues. |
vectors |
updated eigenvectors. |
Note
This perturbation method is based on large sample approximations. It tends to be highly inaccurate for small/medium sized samples and should not be used in this case.
References
Hegde et al. (2006) Perturbation-Based Eigenvector Updates for On-Line Principal Components Analysis and Canonical Correlation Analysis. Journal of VLSI Signal Processing.
See Also
Examples
n <- 1e3
n0 <- 5e2
d <- 10
x <- matrix(runif(n*d), n, d)
x <- x %*% diag(sqrt(12*(1:d)))
# The eigenvalues of cov(x) are approximately equal to 1, 2, ..., d
# and the corresponding eigenvectors are approximately equal to
# the canonical basis of R^d
## Perturbation-based recursive PCA
# Initialization: use factor 1/n0 (princomp) rather
# than factor 1/(n0-1) (prcomp) in calculations
pca <- princomp(x[1:n0,], center=FALSE)
xbar <- pca$center
pca <- list(values=pca$sdev^2, vectors=pca$loadings)
for (i in (n0+1):n) {
xbar <- updateMean(xbar, x[i,], i-1)
pca <- perturbationRpca(pca$values, pca$vectors, x[i,],
i-1, center=xbar) }