project {RcppML}R Documentation

Project a linear factor model

Description

Solves the equation A=whA = wh for either hh or ww given either ww or hh and AA

Usage

project(A, w = NULL, h = NULL, nonneg = TRUE, L1 = 0, mask_zeros = FALSE)

Arguments

A

matrix of features-by-samples in dense or sparse format (preferred classes are "matrix" or "Matrix::dgCMatrix", respectively). Prefer sparse storage when more than half of all values are zero.

w

dense matrix of factors x features giving the linear model to be projected (if h = NULL)

h

dense matrix of factors x samples giving the linear model to be projected (if w = NULL)

nonneg

enforce non-negativity

L1

L1/LASSO penalty to be applied. No scaling is performed. See details.

mask_zeros

handle zeros as missing values, available only when A is sparse

Details

For the classical alternating least squares matrix factorization update problem A=whA = wh, the updates (or projection) of hh is given by the equation:

wTwh=wAjw^Twh = wA_j

which is in the form ax=bax = b where a=wTwa = w^Tw x=hx = h and b=wAjb = wA_j for all columns jj in AA.

Given AA, project can solve for either ww or hh given the other:

Parallelization. Least squares projections in factorizations of rank-3 and greater are parallelized using the number of threads set by setRcppMLthreads. By default, all available threads are used, see getRcppMLthreads. The overhead of parallization is too great for rank-1 and -2 factorization.

L1 Regularization. Any L1 penalty is subtracted from bb and should generally be scaled to max(b), where b=WAjb = WA_j for all columns jj in AA. An easy way to properly scale an L1 penalty is to normalize all columns in ww to sum to 1. No scaling is applied in this function. Such scaling guarantees that L1 = 1 gives a completely sparse solution.

Specializations. There are specializations for symmetric input matrices, and for rank-1 and rank-2 projections. See documentation for nmf for theoretical details and guidance.

Publication reference. For theoretical and practical considerations, please see our manuscript: "DeBruine ZJ, Melcher K, Triche TJ (2021) High-performance non-negative matrix factorization for large single cell data." on BioRXiv.

Value

matrix hh or ww

Author(s)

Zach DeBruine

References

DeBruine, ZJ, Melcher, K, and Triche, TJ. (2021). "High-performance non-negative matrix factorization for large single-cell data." BioRXiv.

See Also

nnls, nmf

Examples

## Not run: 
library(Matrix)
w <- matrix(runif(1000 * 10), 1000, 10)
h_true <- matrix(runif(10 * 100), 10, 100)
# A is the crossproduct of "w" and "h" with 10% signal dropout
A <- (w %*% h_true) * (rsparsematrix(1000, 100, 0.9) > 0)
h <- project(A, w)
cor(as.vector(h_true), as.vector(h))

# alternating projections refine solution (like NMF)
mse_bad <- mse(A, w, rep(1, ncol(w)), h) # mse before alternating updates
h <- project(A, w = w)
w <- project(A, h = h)
h <- project(A, w)
w <- project(A, h = h)
h <- project(A, w)
w <- t(project(A, h = h))
mse_better <- mse(A, w, rep(1, ncol(w)), h) # mse after alternating updates
mse_better < mse_bad

# two ways to solve for "w" that give the same solution
w <- project(A, h = h)
w2 <- project(t(A), w = t(h))
all.equal(w, w2)

## End(Not run)

[Package RcppML version 0.3.7 Index]