softImpute {softImpute} | R Documentation |
impute missing values for a matrix via nuclear-norm regularization.
Description
fit a low-rank matrix approximation to a matrix with missing values via nuclear-norm regularization. The algorithm works like EM, filling in the missing values with the current guess, and then solving the optimization problem on the complete matrix using a soft-thresholded SVD. Special sparse-matrix classes available for very large matrices.
Usage
softImpute(x, rank.max = 2, lambda = 0, type = c("als", "svd"), thresh = 1e-05,
maxit = 100, trace.it = FALSE, warm.start = NULL, final.svd = TRUE)
Arguments
x |
An m by n matrix with NAs. For large matrices can be of class
|
rank.max |
This restricts the rank of the solution. If sufficiently large, and with
|
lambda |
nuclear-norm regularization parameter. If |
type |
two algorithms are implements, |
thresh |
convergence threshold, measured as the relative change in the Frobenius norm between two successive estimates. |
maxit |
maximum number of iterations. |
trace.it |
with |
warm.start |
an svd object can be supplied as a warm start. This is particularly
useful when constructing a path of solutions with decreasing values of
|
final.svd |
only applicable to |
Details
SoftImpute solves the following problem for a matrix X
with
missing entries:
\min||X-M||_o^2 +\lambda||M||_*.
Here ||\cdot||_o
is the Frobenius norm, restricted to the entries
corresponding to the
non-missing entries of X
, and ||M||_*
is the nuclear norm
of M
(sum of singular values).
For full details of the "svd" algorithm are described in the reference
below. The "als" algorithm will be described in a forthcoming
article. Both methods employ special sparse-matrix tricks for large
matrices with many missing values. This package creates a new
sparse-matrix class "SparseplusLowRank"
for matrices of the form
x+ab',
where x
is sparse and a
and b
are tall
skinny matrices, hence ab'
is low rank. Methods for efficient left
and right matrix multiplication are provided for this class. For large
matrices, the function Incomplete()
can be used to build the
appropriate
sparse input matrix from market-format data.
Value
An svd object is returned, with components "u", "d", and "v".
If the solution has zeros in "d", the solution is truncated to rank one
more than the number of zeros (so the zero is visible). If the input
matrix had been centered and scaled by biScale
, the scaling
details are assigned as attributes inherited from the input matrix.
Author(s)
Trevor Hastie, Rahul Mazumder
Maintainer: Trevor Hastie hastie@stanford.edu
References
Rahul Mazumder, Trevor Hastie and Rob Tibshirani (2010)
Spectral Regularization Algorithms for Learning Large Incomplete
Matrices,
https://web.stanford.edu/~hastie/Papers/mazumder10a.pdf
Journal of Machine Learning Research 11 (2010) 2287-2322
See Also
biScale
, svd.als
,Incomplete
,
lambda0
, impute
, complete
Examples
set.seed(101)
n=200
p=100
J=50
np=n*p
missfrac=0.3
x=matrix(rnorm(n*J),n,J)%*%matrix(rnorm(J*p),J,p)+matrix(rnorm(np),n,p)/5
ix=seq(np)
imiss=sample(ix,np*missfrac,replace=FALSE)
xna=x
xna[imiss]=NA
###uses regular matrix method for matrices with NAs
fit1=softImpute(xna,rank=50,lambda=30)
###uses sparse matrix method for matrices of class "Incomplete"
xnaC=as(xna,"Incomplete")
fit2=softImpute(xnaC,rank=50,lambda=30)
###uses "svd" algorithm
fit3=softImpute(xnaC,rank=50,lambda=30,type="svd")
ximp=complete(xna,fit1)
### first scale xna
xnas=biScale(xna)
fit4=softImpute(xnas,rank=50,lambda=10)
ximp=complete(xna,fit4)
impute(fit4,i=c(1,3,7),j=c(2,5,10))
impute(fit4,i=c(1,3,7),j=c(2,5,10),unscale=FALSE)#ignore scaling and centering