R: slim

slim {slimrec}

R Documentation

slim

Description

Compute ratings and coefficient matrix for the sparse ratings matrix using SLIM

Usage

slim(mat, alpha = 0.5, lambda, nlambda, nonNegCoeff = TRUE, directory,
  coeffMat = FALSE, returnMat = FALSE, computeRMSE = FALSE, nproc = 1L,
  progress = TRUE, check = TRUE, cleanup = FALSE)

Arguments

`mat`	(sparse matrix of class 'dgCMatrix') Rating matrix with items along columns and users along rows.
`alpha`	(0 <= alpha <= 1) Parameter to decide the relative weightage between the L1 and L2 penalities. See glmnet for more details. This is set by default at `0.5`.
`lambda`	(positive real number) Parameter to control shrinkage of coefficients. See glmnet for more details. Its advisable not to provide the lambda value, as the function figures out the optimal value by itself.
`nlambda`	(positive integer) Maximum length of the lambda sequence. See glmnet for more details. If `nlambda` argument is missing, it will be set to 100L. This is overridden if `lambda` is specified.
`nonNegCoeff`	(flag) Whether the regression coefficients should be non-negative. There are instances where setting to FALSE decreases the RMSE, but sometimes this could lead to overfitting. Setting `nonNegCoeff` is FALSE, helps interpreting coefficients in the case of implicit feedback. This is set to TRUE by default.
`directory`	(string) A writable directory where a sub-directory is created at the run time and `bigmatrix` objects will be written to. Predicted ratings data is stored in `ratingMat` file and the description is written to `ratingMat.desc` file. If `coeffMat` is TRUE, the coefficents matrix is stored in the file `coeffMat` and the description is written to `coeffMat.desc` file. When directory argument is missing, directory is set via `tempdir()`.
`coeffMat`	(flag) Whether coeffMat is to be computed. This can be later used to predict recommendations for users not present in the `mat` (although `slimrec` package does not provide a `predict function` ). Setting it TRUE increases the computation time. This is set to FALSE by default.
`returnMat`	(flag) Whether the predicted ratings matrix and coefficient matrix (only if `coeffMat` is TRUE) to be read into memory as matrices and delete on disk `bigmatrix` objects. When output matrices are large, setting `returnMat` to TRUE is not advisable. This is set to FALSE by default.
`computeRMSE`	(flag) Whether RMSE values have to be computed corresponding to non-zero values of the `mat`, both overall and columnwise.
`nproc`	(positive integer) Number of parallel processes to be used to compute coefficients for items. If the machine has `k` (>1) cores, the function does not employ more than `k - 1` cores. This is set to 1L by default.
`progress`	(flag) If TRUE(default), shows a progress bar and expected time. This is set to TRUE by default.
`check`	(flag) If TRUE(default), ckecks like whether the matrix is sparse, matrix does not contains NAs, alpha lies between 0 and 1, directory if specified is writable and so on. This is set to TRUE by default.
`cleanup`	(flag) Whether to delete the sub-directory. Note that `returnMat` cannot be set to FALSE when `cleanup` is TRUE. This is set to FALSE by default.

Details

Sparse linear method (DOI: 10.1109/ICDM.2011.134): The method predicts ratings of a user for a given item as a linear combination ratings of all other items provided by the user. The coefficients for an item are determined elastic-net regression (both L1 and L2 regularization) over ratings matrix.

The optimization problem solves:

\min_{c_{j,.}} \frac{1}{2} \|a_{j,.} - Ac_{j,.}\|^2_{2} + \frac{\beta}{2} \|c_{j,.}\|^2_{2} + \gamma \|c_{j,.}\|_{1}

subject to c_{j,j} = 0 and optional non-negative constraint c_{j,.} >= 0 where a_{j,.} is the j th column of the input ratings matrix and c_{j,.} is the j th column of the coefficient matrix(to be determined).

The method assumes that unknown rating values to be zero. Hence, it is primarily designed for implicit feeback mechanisms, but not restricted them. The main use of the ratings is to generate top-n lists of users and items.

Implementation: The non-negative ratings data is input as a sparse matrix of class dgCMatrix without any NA. The items should constitute columns and users should constitute rows. The elastic-net regression problem is solved using glmnet package. The coefficients for each item (a column of the ratings matrix) is computed, in parallel. To avoid memory overload, the output(s) is written to a disk based bigmatrix (using bigmemory package). The predicted rating matrix is the primary output. It is possible to obtain the matrix of coefficients, which will be helpful later to 'predict' the ratings for users not present in the ratings matrix. The RMSE may be computed itemwise and for the entire non-zero values of the ratings matrix. Since, lambda is auto-adjusted, change in alpha might not have significant impact on the RMSE. When it is necessary to get the best accuracy, there is a 'tune' function to arrive at the optimal alpha value by cross-validation. There are options to read the disk based matrix(s) into memory (as matrices) and remove the disk based ones.

Value

A list with these elements:

ratingMat: If returnMat is TRUE, the predicted ratings matrix. Else, NULL
coeffMat: If returnMat is TRUE and coeffMat is TRUE, the coefficient matrix. Else, NULL
lambdas: When lambda is not specified, a vector(length of number of columns of mat) of lambda values chosen. When lambda is specified, it is singleton lambda value.
columnwiseNonZeroRMSE: If computeRMSE is TRUE, vector of RMSE for each column. The errors are computed over only non-zero values of the column of mat. If computeRMSE is FALSE, value is set to NULL.
nonZeroRMSE: If computeRMSE is TRUE, RMSE value. The errors are computed over only non-zero values of the mat. If computeRMSE is FALSE, value is set to NULL.
subdir: Path to the sub-directory where output are placed.
call: function call

Examples

require("slimrec")
data(ft_small)
temp <- slim(ft_small)
str(temp)

## Not run: 
temp <- slim(mat           = ft_implicit # input sparse ratings matrix
             , alpha       = 0.5         # 0 for ridge, 1 for lasso
             #, lambda                   # suggested not to set lambda
             #, nlambda                  # using default nlambda = 100
             , nonNegCoeff = TRUE        # better accuracy, lower interpretability
             , directory   = td          # dir where output matrices are stored
             , coeffMat    = TRUE        # helpful in 'predict'ing later
             , returnMat   = TRUE        # return matrices in memory
             , computeRMSE = TRUE        # RMSE over rated items
             , nproc       = 2L          # number of concurrent processes
             , progress    = TRUE        # show a progressbar
             , check       = TRUE        # do basic checks on input params
             , cleanup     = FALSE       # keep output matrices on disk
             )
str(temp)
# output ratings matrix would be comparatively denser
predMat <- temp[["ratingMat"]] != 0
sum(predMat)/((dim(predMat)[1])*(dim(predMat)[2]))
# recommend top 5 items for a user 10
top_cols(temp[["ratingMat"]]
         , row = 10
         , k   = 5
         )
# if you intend to avoid recommending 10, 215 and 3
top_cols(temp[["ratingMat"]]
         , row = 10
         , k   = 5
         , ignore = c(10, 215, 3)
         )

## End(Not run)

[Package slimrec version 0.1.0 Index]