poismf {poismf} | R Documentation |
Factorization of Sparse Counts Matrices through Poisson Likelihood
Description
Creates a low-rank non-negative factorization of a sparse counts matrix by maximizing Poisson likelihood minus L1/L2 regularization, using gradient-based optimization procedures.
The model idea is to approximate: \mathbf{X} \sim \texttt{Poisson}(\mathbf{A} \mathbf{B}^T)
Ideal for usage in recommender systems, in which the 'X' matrix would consist of interactions (e.g. clicks, views, plays), with users representing the rows and items representing the columns.
Usage
poismf(
X,
k = 50,
method = "tncg",
l2_reg = "auto",
l1_reg = 0,
niter = "auto",
maxupd = "auto",
limit_step = TRUE,
initial_step = 1e-07,
early_stop = TRUE,
reuse_prev = FALSE,
weight_mult = 1,
handle_interrupt = TRUE,
nthreads = parallel::detectCores()
)
Arguments
X |
The counts matrix to factorize. Can be:
Passing sparse matrices is faster as it will not need to re-enumerate the rows and columns. Dense (regular) matrices will be converted to sparse format, which is inefficient. |
k |
Number of latent factors to use (dimensionality of the low-rank factorization). If ‘k' is very small (e.g. 'k=3'), it’s recommended to use ‘method=’pg'', otherwise it's recommended to use ‘method=’tncg'‘, and if using 'method=’cg'', it's recommended to use large 'k' (at least 100). |
method |
Optimization method to use as inner solver. Options are:
|
l2_reg |
Strength of L2 regularization. It is recommended to use small values
along with ‘method=’tncg'‘, very large values along with 'method=’pg'',
and medium to large values with ‘method=’cg''. If passing '"auto"',
will set it to |
l1_reg |
Strength of L1 regularization. Not recommended. |
niter |
Number of outer iterations to perform. One iteration denotes an update over both matrices. If passing ''auto'', will set it to 10 for TNCG and PG, or to 30 for CG. Using more iterations usually leads to better results for CG, at the expense of longer fitting times. TNCG is more likely to converge to a local optimum with fewer outer iterations, with further iterations not changing the values of any single factor. |
maxupd |
Maximum number of inner iterations for each user/item vector. Note: for 'method=TNCG', this means maximum number of function evaluations rather than number of updates, so it should be higher. You might also want to try decreasing this while increasing 'niter'. For ‘method=’pg'', this will be taken as the actual number of updates, as it does not perform a line search like the other methods. If passing ‘"auto"', will set it to '15*k' for 'method=’tncg'', 5 for ‘method=’cg'‘, and 10 for 'method=’pg''. If using ‘method=’cg'', one might also want to try other combinations such as 'maxupd=1' and 'niter=100'. |
limit_step |
When passing ‘method=’cg'', whether to limit the step sizes in each update so as to drive at most one variable to zero each time, as prescribed in [3]. If running the procedure for many iterations, it's recommended to set this to ‘TRUE'. You also might set 'method=’cg'' plus 'maxupd=1' and 'limit_step=FALSE' to reduce the algorithm to simple projected gradient descent with a line search. |
initial_step |
Initial step size to use for proximal gradient updates. Larger step sizes reach converge faster, but are more likely to result in failed optimization. Ignored when passing ‘method=’tncg'‘ or 'method=’cg'', as those will perform a line seach instead. |
early_stop |
In the TNCG method, whether to stop before reaching the maximum number of iterations if the updates do not change the factors significantly or at all. |
reuse_prev |
In the TNCG method, whether to reuse the factors obtained in the previous iteration as starting point for each inner update. This has the effect of reaching convergence much quicker, but will oftentimes lead to slightly worse solutions. If passing 'FALSE' and 'maxupd' is small, the obtained factors might not be sparse at all. If passing 'TRUE', they will typically be less sparse than when passing ‘FALSE' with large 'maxupd' or than with 'method=’cg''. Setting it to 'TRUE' has the side effect of potentially making the factors obtained when fitting the model different from the factors obtained after calling the 'predict_factors' function with the same data the model was fit. For methods other than TNCG, this is always assumed 'TRUE'. |
weight_mult |
Extra multiplier for the weight of the positive entries over the missing entries in the matrix to factorize. Be aware that Poisson likelihood will implicitly put more weight on the non-missing entries already. Passing larger values will make the factors have larger values (which might be desirable), and can help with instability and failed optimization cases. If passing this, it's recommended to try very large values (e.g. 10^2), and might require adjusting the other hyperparameters. |
handle_interrupt |
When receiving an interrupt signal, whether the model should stop early and leave a usable object with the parameters obtained up to the point when it was interrupted (when passing 'TRUE'), or raise an interrupt exception without producing a fitted model object (when passing 'FALSE'). |
nthreads |
Number of parallel threads to use. |
Details
In order to speed up the optimization procedures, it's recommended to use an optimized library for BLAS operations such as MKL or OpenBLAS (ideally the "openmp" variant). See this link for instructions on getting OpenBLAS in R for Windows.
When using proximal gradient method, this model is prone to numerical instability, and can turn out to spit all NaNs or zeros in the fitted parameters. The TNCG method is not prone to such failed optimizations.
Although the main idea behind this software is to produce sparse model/factor matrices, they are always taken in dense format when used inside this software, and as such, it might be faster to use these matrices through some other external library that would be able to exploit their sparsity.
For reproducible results, random number generation seeds can be controlled through 'set.seed'.
Model quality or recommendation quality can be evaluated using the recometrics package.
Value
An object of class 'poismf' with the following fields of interest:
Fields
A
The user/document/row-factor matrix (will be transposed due to R's column-major storage of matrices).
B
The item/word/column-factor matrix (will be transposed due to R's column-major storage of matrices).
levels_A
A vector indicating which user/row ID corresponds to each row position in the 'A' matrix. This will only be generated when passing 'X' as a 'data.frame', otherwise will not remap them.
levels_B
A vector indicating which item/column ID corresponds to each row position in the 'B' matrix. This will only be generated when passing 'X' as a 'data.frame', otherwise will not remap them.
References
Cortes, David. "Fast Non-Bayesian Poisson Factorization for Implicit-Feedback Recommendations." arXiv preprint arXiv:1811.01908 (2018).
Nash, Stephen G. "Newton-type minimization via the Lanczos method." SIAM Journal on Numerical Analysis 21.4 (1984): 770-788.
Li, Can. "A conjugate gradient type method for the nonnegative constraints optimization problems." Journal of Applied Mathematics 2013 (2013).
See Also
predict.poismf topN factors get.factor.matrices get.model.mappings
Examples
library(poismf)
### create a random sparse data frame in COO format
nrow <- 10^2 ## <- users
ncol <- 10^3 ## <- items
nnz <- 10^4 ## <- events (agg)
set.seed(1)
X <- data.frame(
row_ix = sample(nrow, size=nnz, replace=TRUE),
col_ix = sample(ncol, size=nnz, replace=TRUE),
count = rpois(nnz, 1) + 1
)
X <- X[!duplicated(X[, c("row_ix", "col_ix")]), ]
### can also pass X as sparse matrix - see below
### X <- Matrix::sparseMatrix(
### i=X$row_ix, j=X$col_ix, x=X$count,
### repr="T")
### the indices can also be characters or other types:
### X$row_ix <- paste0("user", X$row_ix)
### X$col_ix <- paste0("item", X$col_ix)
### factorize the randomly-generated sparse matrix
model <- poismf(X, k=5, method="tncg", nthreads=1)
### (for sparse factors, use higher 'k' and larger data)
### predict functionality (chosen entries in X)
### predict entry [1, 10] (row 1, column 10)
predict(model, 1, 10, nthreads=1)
### predict entries [1,4], [1,5], [1,6]
predict(model, c(1, 1, 1), c(4, 5, 6), nthreads=1)
### ranking functionality (for recommender systems)
topN(model, user=2, n=5, exclude=X$col_ix[X$row_ix==2], nthreads=1)
topN.new(model, X=X[X$row_ix==2, c("col_ix","count")],
n=5, exclude=X$col_ix[X$row_ix==2], nthreads=1)
### obtaining latent factors
a_vec <- factors.single(model,
X[X$row_ix==2, c("col_ix","count")])
A_full <- factors(model, X, nthreads=1)
A_orig <- get.factor.matrices(model)$A
### (note that newly-obtained factors will differ slightly)
sqrt(mean((A_full["2",] - A_orig["2",])^2))