R: Calulate top-N predictions for a new or existing user

topN {cmfrec}

R Documentation

Calulate top-N predictions for a new or existing user

Description

Determine top-ranked items for a user according to their predicted values, among the items to which the model was fit.

Can produce rankings for existing users (which where in the 'X' data to which the model was fit) through function 'topN', or for new users (which were not in the 'X' data to which the model was fit, but for which there is now new data) through function 'topN_new', assuming there is either 'X' data, 'U' data, or both (i.e. can do cold-start and warm-start rankings).

For the CMF model, depending on parameter 'include_all_X', might recommend items which had only side information if their predictions are high enough.

For the ContentBased model, might be used to rank new items (not present in the 'X' or 'I' data to which the model was fit) given their 'I' data, for new users given their 'U' data. For the other models, will only rank existing items (columns of the 'X' to which the model was fit) - see predict_new_items for an alternative for the other models.

Important: the model does not keep any copies of the original data, and as such, it might recommend items that were already seen/rated/consumed by the user. In order to avoid this, must manually pass the seen/rated/consumed entries to the argument 'exclude' (see details below).

This method produces an exact ranking by computing all item predictions for a given user. As the number of items grows, this can become a rather slow operation - for model serving purposes, it's usually a better idea to obtain an an approximate top-N ranking through software such as "hnsw" or "Milvus" from the calculated user factors and item factors.

Usage

topN(
  model,
  user = NULL,
  n = 10L,
  include = NULL,
  exclude = NULL,
  output_score = FALSE,
  nthreads = model$info$nthreads
)

topN_new(model, ...)

## S3 method for class 'CMF'
topN_new(
  model,
  X = NULL,
  X_col = NULL,
  X_val = NULL,
  U = NULL,
  U_col = NULL,
  U_val = NULL,
  U_bin = NULL,
  weight = NULL,
  n = 10L,
  include = NULL,
  exclude = NULL,
  output_score = FALSE,
  nthreads = model$info$nthreads,
  ...
)

## S3 method for class 'CMF_implicit'
topN_new(
  model,
  X = NULL,
  X_col = NULL,
  X_val = NULL,
  U = NULL,
  U_col = NULL,
  U_val = NULL,
  n = 10L,
  include = NULL,
  exclude = NULL,
  output_score = FALSE,
  nthreads = model$info$nthreads,
  ...
)

## S3 method for class 'ContentBased'
topN_new(
  model,
  U = NULL,
  U_col = NULL,
  U_val = NULL,
  I = NULL,
  n = 10L,
  include = NULL,
  exclude = NULL,
  output_score = FALSE,
  nthreads = model$info$nthreads,
  ...
)

## S3 method for class 'OMF_explicit'
topN_new(
  model,
  X = NULL,
  X_col = NULL,
  X_val = NULL,
  U = NULL,
  U_col = NULL,
  U_val = NULL,
  weight = NULL,
  exact = FALSE,
  n = 10L,
  include = NULL,
  exclude = NULL,
  output_score = FALSE,
  nthreads = model$info$nthreads,
  ...
)

## S3 method for class 'OMF_implicit'
topN_new(
  model,
  X = NULL,
  X_col = NULL,
  X_val = NULL,
  U = NULL,
  U_col = NULL,
  U_val = NULL,
  n = 10L,
  include = NULL,
  exclude = NULL,
  output_score = FALSE,
  nthreads = model$info$nthreads,
  ...
)

Arguments

`model`	A collective matrix factorization model from this package - see fit_models for details.
`user`	User (row of 'X') for which to rank items. If 'X' to which the model was fit was a 'data.frame', should pass an ID matching to the first column of 'X' (the user indices), otherwise should pass a row number for 'X', with numeration starting at 1. This is optional for the MostPopular model, but must be passed for all others. For making recommendations about new users (that were not present in the 'X' to which the model was fit), should use 'topN_new' and pass either 'X' or 'U' data. For example usage, see the main section fit_models.
`n`	Number of top-predicted items to output.
`include`	If passing this, will only make a ranking among the item IDs provided here. See the documentation for 'user' for how the IDs should be passed. This should be an integer or character vector, or alternatively, as a sparse vector from the 'Matrix' package (inheriting from class 'sparseVector'), from which the non-missing entries will be taken as those to include. Cannot be used together with 'exclude'.
`exclude`	If passing this, will exclude from the ranking all the item IDs provided here. See the documentation for 'user' for how the IDs should be passed. This should be an integer or character vector, or alternatively, as a sparse vector from the 'Matrix' package (inheriting from class 'sparseVector'), from which the non-missing entries will be taken as those to exclude. Cannot be used together with 'include'.
`output_score`	Whether to also output the predicted values, in addition to the indices of the top-predicted items.
`nthreads`	Number of parallel threads to use.
`...`	Not used.
`X`	'X' data for a new user for which to make recommendations, either as a numeric vector (class 'numeric'), or as a sparse vector from package 'Matrix' (class 'dsparseVector'). If the 'X' to which the model was fit was a 'data.frame', the column/item indices will have been reindexed internally, and the numeration can be found under 'model$info$item_mapping'. Alternatively, can instead pass the column indices and values and let the model reindex them (see 'X_col' and 'X_val'). Should pass at most one of 'X' or 'X_col'+'X_val'. Dense 'X' data is not supported for 'CMF_implicit' or 'OMF_implicit'.
`X_col`	'X' data for a new user for which to make recommendations, in sparse vector format, with 'X_col' denoting the items/columns which are not missing. If the 'X' to which the model was fit was a 'data.frame', here should pass IDs matching to the second column of that 'X', which will be reindexed internally. Otherwise, should have column indices with numeration starting at 1 (passed as an integer vector). Should pass at most one of 'X' or 'X_col'+'X_val'.
`X_val`	'X' data for a new user for which to make recommendations, in sparse vector format, with 'X_val' denoting the associated values to each entry in 'X_col' (should be a numeric vector of the same length as 'X_col'). Should pass at most one of 'X' or 'X_col'+'X_val'.
`U`	'U' data for a new user for which to make recommendations, either as a numeric vector (class 'numeric'), or as a sparse vector from package 'Matrix' (class 'dsparseVector'). Alternatively, if 'U' is sparse, can instead pass the indices of the non-missing columns and their values separately (see 'U_col'). Should pass at most one of 'U' or 'U_col'+'U_val'.
`U_col`	'U' data for a new user for which to make recommendations, in sparse vector format, with 'U_col' denoting the attributes/columns which are not missing. Should have numeration starting at 1 (should be an integer vector). Should pass at most one of 'U' or 'U_col'+'U_val'.
`U_val`	'U' data for a new user for which to make recommendations, in sparse vector format, with 'U_val' denoting the associated values to each entry in 'U_col' (should be a numeric vector of the same length as 'U_col'). Should pass at most one of 'U' or 'U_col'+'U_val'.
`U_bin`	Binary columns of 'U' for a new user for which to make recommendations, on which a sigmoid transformation will be applied. Should be passed as a numeric vector. Note that 'U' and 'U_bin' are not mutually exclusive.
`weight`	(Only for the explicit-feedback models) Associated weight to each non-missing observation in 'X'. Must have the same number of entries as 'X' - that is, if passing a dense vector of length 'n', 'weight' should be a numeric vector of length 'n' too, if passing a sparse vector, should have a lenght corresponding to the number of non-missing elements.
`I`	(Only for the 'ContentBased' model) New 'I' data to rank for the given user, with rows denoting new columns of the 'X' matrix. Can be passed in the following formats: A sparse COO/triplets matrix, either from package 'Matrix' (class 'dgTMatrix'), or from package 'SparseM' (class 'matrix.coo'). A sparse matrix in CSR format, either from package 'Matrix' (class 'dgRMatrix'), or from package 'SparseM' (class 'matrix.csr'). Passing the input as CSR is faster than COO as it will be converted internally. A sparse row vector from package 'Matrix' (class 'dsparseVector'). A dense matrix from base R (class 'matrix'), with missing entries set as NA. A dense vector from base R (class 'numeric'). A 'data.frame'. When passing 'I', the item indices in 'include', 'exclude', and in the resulting output refer to rows of 'I', and the ranking will be made only among the rows of 'I' (that is, they will not be compared against the old 'X' data).
`exact`	(In the 'OMF_explicit' model) Whether to calculate 'A' and 'Am' with the regularization applied to 'A' instead of to 'Am' (if using the L-BFGS method, this is how the model was fit). This is usually a slower procedure.

Details

Be aware that this function is multi-threaded. As such, if a large batch of top-N predictions is to be calculated in parallel for different users (through e.g. ‘mclapply' or similar), it’s recommended to decrease the number of threads in the model to 1 (e.g. 'model$info$nthreads <- 1L') and to set the number of BLAS threads to 1 (through e.g. 'RhpcBLASctl' or environment variables).

For better cold-start recommendations with CMF_implicit, one can also add item biases by using the 'CMF' model with parameters that would mimic 'CMF_implicit' plus the biases.

Value

If passing 'output_score=FALSE' (the default), will output the indices of the top-predicted elements. If passing 'output_score=TRUE', will pass a list with two elements:

'item': The indices of the top-predicted elements.
'score': The predicted value for each corresponding element in 'item'.