R: Decompose a matrix into a low-rank component and a sparse...

rpca {rpca}

R Documentation

Decompose a matrix into a low-rank component and a sparse component by solving Principal Components Pursuit

Description

This function decomposes a rectangular matrix M into a low-rank component, and a sparse component, by solving a convex program called Principal Component Pursuit.

Usage

rpca(M, 
     lambda = 1/sqrt(max(dim(M))), mu = prod(dim(M))/(4 * sum(abs(M))), 
     term.delta = 10^(-7), max.iter = 5000, trace = FALSE,
     thresh.nuclear.fun = thresh.nuclear, thresh.l1.fun = thresh.l1, 
     F2norm.fun = F2norm)

Arguments

`M`	a rectangular matrix that is to be decomposed into a low-rank component and a sparse component `M = L + S` .
`lambda`	parameter of the convex problem `\\|L\\|_{*} + \lambda \\|S\\|_{1}` which is minimized in the Principal Components Pursuit algorithm. The default value is the one suggested in Candès, E. J., section 1.4, and together with reasonable assumptions about `L` and `S` guarantees that a correct decomposition is obtained.
`mu`	parameter from the augumented Lagrange multiplier formulation of the PCP, Candès, E. J., section 5. Default value is the one suggested in references.
`term.delta`	The algorithm terminates when `\\|M-L-S\\|_{F} \leq \delta \\|M\\|_{F}` where `\\|\ \\|_{F}` is Frobenius norm of a matrix.
`max.iter`	Maximal number of iterations of the augumented Lagrange multiplier algorithm. A warning is issued if the algorithm does not converge by then.
`trace`	Print out information with every iteration.
`thresh.nuclear.fun`, `thresh.l1.fun`, `F2norm.fun`	Arguments for internal use only.

Details

These functions decompose a rectangular matrix M into a low-rank component, and a sparse component, by solving a convex program called Principal Component Pursuit:

\textrm{minimize}\quad \|L\|_{*} + \lambda \|S\|_{1}

\textrm{subject to}\quad L+S = M

where \|L\|_{*} is the nuclear norm of L (sum of singular values).

Value

The function returns two matrices S and L, which have the property that L+S \simeq M, where the quality of the approximation depends on the argument term.delta, and the convergence of the algorithm.

`S`	The sparse component of the matrix decomposition.
`L`	The low-rank component of the matrix decomposition.
`L.svd`	The singular value decomposition of `L`, as returned by the function `La.svd` .
`convergence$converged`	`TRUE` if the algorithm converged with respect to `term.delta`.
`convergence$iterations`	Number of performed iterations.
`convergence$final.delta`	The final iteration `delta` which is compared with `term.delta`.
`convergence$all.delta`	All `delta` from all iterations.

Author(s)

Maciek Sykulski [aut, cre]

References

Candès, E. J., Li, X., Ma, Y., & Wright, J. (2011). Robust principal component analysis?. Journal of the ACM (JACM), 58(3), 11.

Yuan, X., & Yang, J. (2009). Sparse and low-rank matrix decomposition via alternating direction methods. preprint, 12.

Examples

data(iris)
M <- as.matrix(iris[,1:4])
Mcent <- sweep(M,2,colMeans(M))

res <- rpca(Mcent)

## Check convergence and number of iterations
with(res$convergence,list(converged,iterations))
## Final delta F2 norm divided by F2norm(Mcent)
with(res$convergence,final.delta)

## Check properites of the decomposition
with(res,c(
all(abs( L+S - Mcent ) < 10^-5),
all( L == L.svd$u%*%(L.svd$d*L.svd$vt) )
))
# [1] TRUE TRUE

## The low rank component has rank 2
length(res$L.svd$d)
## However, the sparse component is not sparse 
## - thus this data set is not the best example here.
mean(res$S==0)

## Plot the first (the only) two principal components
## of the low-rank component L
rpc<-res$L.svd$u%*%diag(res$L.svd$d)
plot(jitter(rpc[,1:2],amount=.001),col=iris[,5])

## Compare with classical principal components
pc <- prcomp(M,center=TRUE)
plot(pc$x[,1:2],col=iris[,5])
points(rpc[,1:2],col=iris[,5],pch="+")

## "Sparse" elements distribution
plot(density(abs(res$S),from=0))
curve(dexp(x,rate=1/mean(abs(res$S))),add=TRUE,lty=2)

## Plot measurements against measurements corrected by sparse components
par(mfcol=c(2,2))
for(i in 1:4) {
plot(M[,i],M[,i]-res$S[,i],col=iris[,5],xlab=colnames(M)[i])
}

[Package rpca version 0.2.3 Index]