fit_poisson_nmf {fastTopics} | R Documentation |
Fit Non-negative Matrix Factorization to Count Data
Description
Approximate the input matrix X
by the
non-negative matrix factorization tcrossprod(L,F)
, in which
the quality of the approximation is measured by a
“divergence” criterion; equivalently, optimize the
likelihood under a Poisson model of the count data, X
, in
which the Poisson rates are given by tcrossprod(L,F)
.
Function fit_poisson_nmf
runs a specified number of
coordinate-wise updates to fit the L and F matrices.
Usage
fit_poisson_nmf(
X,
k,
fit0,
numiter = 100,
update.factors = seq(1, ncol(X)),
update.loadings = seq(1, nrow(X)),
method = c("scd", "em", "mu", "ccd"),
init.method = c("topicscore", "random"),
control = list(),
verbose = c("progressbar", "detailed", "none")
)
fit_poisson_nmf_control_default()
init_poisson_nmf(
X,
F,
L,
k,
init.method = c("topicscore", "random"),
beta = 0.5,
betamax = 0.99,
control = list(),
verbose = c("detailed", "none")
)
init_poisson_nmf_from_clustering(X, clusters, ...)
Arguments
X |
The n x m matrix of counts; all entries of X should be
non-negative. It can be a sparse matrix (class |
k |
An integer 2 or greater giving the matrix rank. This
argument should only be specified if the initial fit ( |
fit0 |
The initial model fit. It should be an object of class
“poisson_nmf_fit”, such as an output from
|
numiter |
The maximum number of updates of the factors and loadings to perform. |
update.factors |
A numeric vector specifying which factors
(rows of |
update.loadings |
A numeric vector specifying which loadings
(rows of |
method |
The method to use for updating the factors and
loadings. Four methods are implemented: multiplicative updates,
|
init.method |
The method used to initialize the factors and
loadings. When |
control |
A list of parameters controlling the behaviour of the optimization algorithm (and the Topic SCORE algorithm if it is used to initialize the model parameters). See ‘Details’. |
verbose |
When |
F |
An optional argument giving is the initial estimate of the
factors (also known as “basis vectors”). It should be an m x
k matrix, where m is the number of columns in the counts matrix
|
L |
An optional argument giving the initial estimate of the
loadings (also known as “activations”). It should be an n x k
matrix, where n is the number of rows in the counts matrix
|
beta |
Initial setting of the extrapolation parameter. This is
|
betamax |
Initial setting for the upper bound on the
extrapolation parameter. This is |
clusters |
A factor specifying a grouping, or clustering, of
the rows of |
... |
Additional arguments passed to |
Details
In Poisson non-negative matrix factorization (Lee & Seung,
2001), counts x_{ij}
in the n \times m
matrix, X
,
are modeled by the Poisson distribution:
x_{ij} \sim
\mathrm{Poisson}(\lambda_{ij}).
Each Poisson rate,
\lambda_{ij}
, is a linear combination of parameters
f_{jk} \geq 0, l_{ik} \geq 0
to be fitted to the data:
\lambda_{ij} = \sum_{k=1}^K l_{ik} f_{jk},
in which K
is a user-specified tuning parameter specifying the rank of the
matrix factorization. Function fit_poisson_nmf
computes
maximum-likelihood estimates (MLEs) of the parameters. For
additional mathematical background, and an explanation of how
Poisson NMF is connected to topic modeling, see the vignette:
vignette(topic = "relationship",package = "fastTopics")
.
Using this function requires some care; only minimal argument checking is performed, and error messages may not be helpful.
The EM and multiplicative updates are simple and fast, but can be
slow to converge to a stationary point. When control$numiter
= 1
, the EM and multiplicative updates are mathematically
equivalent to the multiplicative updates, and therefore share the
same convergence properties. However, the implementation of the EM
updates is quite different; in particular, the EM updates are more
suitable for sparse counts matrices. The implementation of the
multiplicative updates is adapted from the MATLAB code by Daichi
Kitamura http://d-kitamura.net.
Since the multiplicative updates are implemented using standard matrix operations, the speed is heavily dependent on the BLAS/LAPACK numerical libraries used. In particular, using optimized implementations such as OpenBLAS or Intel MKL can result in much improved performance of the multiplcative updates.
The cyclic co-ordinate descent (CCD) and sequential co-ordinate descent (SCD) updates adopt the same optimization strategy, but differ in the implementation details. In practice, we have found that the CCD and SCD updates arrive at the same solution when initialized “sufficiently close” to a stationary point. The CCD implementation is adapted from the C++ code developed by Cho-Jui Hsieh and Inderjit Dhillon, which is available for download at https://www.cs.utexas.edu/~cjhsieh/nmf/. The SCD implementation is based on version 0.4-3 of the ‘NNLM’ package.
An additional re-scaling step is performed after each update to promote numerical stability.
We use three measures of progress for the model fitting: (1)
improvement in the log-likelihood (or deviance), (2) change in the
model parameters, and (3) the residuals of the Karush-Kuhn-Tucker
(KKT) first-order conditions. As the iterates approach a stationary
point of the loss function, the change in the model parameters
should be small, and the residuals of the KKT system should vanish.
Use plot_progress
to plot the improvement in the
solution over time.
See fit_topic_model
for additional guidance on model
fitting, particularly for large or complex data sets.
The control
argument is a list in which any of the
following named components will override the default optimization
algorithm settings (as they are defined by
fit_poisson_nmf_control_default
):
numiter
Number of “inner loop” iterations to run when performing and update of the factors or loadings. This must be set to 1 for
method = "mu"
andmethod = "ccd"
.nc
Number of RcppParallel threads to use for the updates. When
nc
isNA
, the number of threads is determined by callingdefaultNumThreads
. This setting is ignored for the multiplicative upates (method = "mu"
).nc.blas
Number of threads used in the numerical linear algebra library (e.g., OpenBLAS), if available. For best performance, we recommend setting this to 1 (i.e., no multithreading).
min.delta.loglik
Stop performing updates if the difference in the Poisson NMF log-likelihood between two successive updates is less than
min.delta.loglik
. This should not be kept at zero whencontrol$extrapolate = TRUE
because the extrapolated updates are expected to occasionally keep the likelihood unchanged. Ignored ifmin.delta.loglik < 0
.min.res
Stop performing updates if the maximum KKT residual is less than
min.res
. Ignored ifmin.res < 0
.minval
A small, positive constant used to safeguard the multiplicative updates. The safeguarded updates are implemented as
F <- pmax(F1,minval)
andL <- pmax(L1,minval)
, whereF1
andL1
are the factors and loadings matrices obtained by applying an update. This is motivated by Theorem 1 of Gillis & Glineur (2012). Settingminval = 0
is allowed, but some methods are not guaranteed to converge to a stationary point without this safeguard, and a warning will be given in this case.extrapolate
When
extrapolate = TRUE
, the extrapolation scheme of Ang & Gillis (2019) is used.extrapolate.reset
To promote better numerical stability of the extrapolated updates, they are “reset” every so often. This parameter determines the number of iterations to wait before resetting.
beta.increase
When the extrapolated update improves the solution, scale the extrapolation parameter by this amount.
beta.reduce
When the extrapolaaed update does not improve the solution, scale the extrapolation parameter by this amount.
betamax.increase
When the extrapolated update improves the solution, scale the extrapolation parameter by this amount.
eps
A small, non-negative number that is added to the terms inside the logarithms to sidestep computing logarithms of zero. This prevents numerical problems at the cost of introducing a small inaccuracy in the solution. Increasing this number may lead to faster convergence but possibly a less accurate solution.
zero.threshold
A small, non-negative number used to determine which entries of the solution are exactly zero. Any entries that are less than or equal to
zero.threshold
are considered to be exactly zero.
An additional setting, control$init.numiter
, controls the
number of sequential co-ordinate descent (SCD) updates that are
performed to initialize the loadings matrix when init.method
= "topicscore"
.
Value
init_poisson_nmf
and fit_poisson_nmf
both
return an object capturing the optimization algorithm state (for
init_poisson_nmf
, this is the initial state). It is a list
with the following elements:
F |
A matrix containing the current best estimates of the factors. |
L |
A matrix containing the current best estimates of the loadings. |
Fn |
A matrix containing the non-extrapolated factor estimates.
If extrapolation is not used, |
Ln |
A matrix containing the non-extrapolated estimates of the
loadings. If extrapolation is not used, |
Fy |
A matrix containing the extrapolated factor estimates. If
the extrapolation scheme is not used, |
Ly |
A matrix containing the extrapolated estimates of the
loadings. If extrapolation is not used, |
loss |
Value of the objective (“loss”) function computed at the current best estimates of the factors and loadings. |
loss.fnly |
Value of the objective (“loss”) function
computed at the extrapolated solution for the loadings ( |
iter |
The number of the most recently completed iteration. |
beta |
The extrapolation parameter, |
betamax |
Upper bound on the extrapolation parameter. This is
|
beta0 |
The setting of the extrapolation parameter at the last iteration that improved the solution. |
progress |
A data frame containing detailed information about
the algorithm's progress. The data frame should have at most
|
References
Ang, A. and Gillis, N. (2019). Accelerating nonnegative matrix factorization algorithms using extrapolation. Neural Computation 31, 417–439.
Cichocki, A., Cruces, S. and Amari, S. (2011). Generalized alpha-beta divergences and their application to robust nonnegative matrix factorization. Entropy 13, 134–170.
Gillis, N. and Glineur, F. (2012). Accelerated multiplicative
updates and hierarchical ALS algorithms for nonnegative matrix
factorization. Neural Computation 24
, 1085–1105.
Hsieh, C.-J. and Dhillon, I. (2011). Fast coordinate descent methods with variable selection for non-negative matrix factorization. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, p. 1064-1072
Lee, D. D. and Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems 13, 556–562.
Lin, X. and Boutros, P. C. (2018). Optimization and expansion of non-negative matrix factorization. BMC Bioinformatics 21, 7.
Ke, Z. & Wang, M. (2017). A new SVD approach to optimal topic estimation. arXiv https://arxiv.org/abs/1704.07016
See Also
fit_topic_model
, plot_progress
Examples
# Simulate a (sparse) 80 x 100 counts matrix.
library(Matrix)
set.seed(1)
X <- simulate_count_data(80,100,k = 3,sparse = TRUE)$X
# Remove columns (words) that do not appear in any row (document).
X <- X[,colSums(X > 0) > 0]
# Run 10 EM updates to find a good initialization.
fit0 <- fit_poisson_nmf(X,k = 3,numiter = 10,method = "em")
# Fit the Poisson NMF model by running 50 EM updates.
fit_em <- fit_poisson_nmf(X,fit0 = fit0,numiter = 50,method = "em")
# Fit the Poisson NMF model by running 50 extrapolated SCD updates.
fit_scd <- fit_poisson_nmf(X,fit0 = fit0,numiter = 50,method = "scd",
control = list(extrapolate = TRUE))
# Compare the two fits.
fits <- list(em = fit_em,scd = fit_scd)
compare_fits(fits)
plot_progress(fits,y = "loglik")
plot_progress(fits,y = "res")
# Recover the topic model. After this step, the L matrix contains the
# mixture proportions ("loadings"), and the F matrix contains the
# word frequencies ("factors").
fit_multinom <- poisson2multinom(fit_scd)