R: Create a recommender system.

rrecsys {rrecsys}

R Documentation

Create a recommender system.

Description

Based on the specific given algorithm a recommendation model will be trained.

Usage

rrecsys(data, alg, ...)

Arguments

`data`	Training set of class `"matrix"`. The columns correspond to items and the rows correspond to users.
`alg`	A `"character"` string specifying the recommender algorithm to apply on the data.
`...`	other attributes, see details.

Details

Based on the value of alg the attributes will have different names and values. Possible configuration of alg and it's meaning:

itemAverage. When alg = "itemAverage" the average rating of an item is used to make predictions and recommendations.
userAverage. When alg = "userAverage" the average rating of a user is used to make predictions and recommendations.
globalAverage. When alg = "globalAverage" the overall average of all ratings is used to make predictions and recommendations.
Mostpopular. The most popular algorithm ( alg = "mostpopular") is the most simple algorithm for recommendations. Item will be ordered based on the number of times that they were rated. Recommendations for a particular user will be the most popular items from the data set which are not contained in the user's training set.
IBKNN. As alg = "IBKNN" a k-nearest neighbor item-based collaborative filtering algorithm. Given two items a and b, we consider them as rating vectors \vec{a} and \vec{b}. If the argument simFunct is set to "cos" the method computes the cosine similarity as:

sim(\vec{a}, \vec{b}) = cos(\vec{a}, \vec{b}) = \frac{\vec{a} \cdot \vec{b} }{|\vec{a}| \ast |\vec{b}|}

If the argument simFunct is set to "adjCos" the method determines the "adjusted cosine" distance among the items as:

sim(\vec{a}, \vec{b}) = \frac{\sum_{u \in U} (r_{u,a} - \overline{r_{u}}) \ast (r_{u,b} - \overline{r_{u}})}{\sqrt{(r_{u,a} - \overline{r_{u}})^2} \ast \sqrt{(r_{u,b} - \overline{r_{u}})^2}}

It extracts, based on the value of the neigh attribute, the number of closest neighbors for each item.
UBKNN. As alg = "UBKNN" a k-nearest neighbor user-based collaborative filtering algorithm. Given two users u and u, we consider them as rating vectors \vec{u} and \vec{v}. If the argument simFunct is set to "cos" the method computes the cosine similarity as:

sim(\vec{u}, \vec{v}) = cos(\vec{u}, \vec{v}) = \frac{\vec{u} \cdot \vec{v} }{|\vec{u}| \ast |\vec{v}|}

If the argument simFunct is set to "Pearson" the method determines the "Pearson correlation" among the users as:

sim(\vec{u}, \vec{v}) = Pearson(\vec{u}, \vec{v}) = \frac{\sum \limits_{i \in I_u \cap I_v} (R_{ui} - \overline{R_{u}}) \ast (R_{vi} - \overline{R_{v}})}{\sqrt{\sum \limits_{i \in I_u \cap I_v}(R_{ui} - \overline{R_{u}})^2 \ast \sum \limits_{i \in I_u \cap I_v}(R_{vi} - \overline{R_{v}})^2}}

It extracts, based on the value of the neigh attribute, the number of closest neighbors for each item.
FunkSVD. It implements alg = "funkSVD" a stochastic gradient descent optimization technique. The U(user) and V(item) factor matrices are initialized at small values and cropped to k features. Each feature is trained until convergence (the convergence value has to be specified by the user, by configuring the steps argument). On each loop the algorithm predicts r'_{ui} and calculates the error as:

r'_{ui} = u_{u} \ast v^{T}_{i}

e_{ui} =r_{ui} - r'_{ui}

The factors are updated:

v_{ik} \gets v_{ik} + learningRate \ast (e_{ui} \ast u_{uk} - regCoef \ast v_{ik})

u_{uk} \gets u_{uk} + lambda \ast (e_{ui} \ast v_{ik} - gamma \ast u_{uk})

. The attribute learningRate represents the learning rate, while regCoef corresponds to the weight of the regularization term. If the argument biases is TRUE, the biases will be computed to update the features and generate predictions.
wALS. The alg = "wALS" weighted Alternated Least squares method. For a given non-negative weight matrix W the algorithm will perform updates on the item V and user U feature matrix as follows:

U_i = R_i \ast \widetilde{W_i} \ast V \ast (V^T \ast \widetilde{W_i} \ast V + lambda (\sum_j W_{ij}) I ) ^{-1}

V_j = R_j^T \ast \widetilde{W_j} \ast U \ast (V^T \ast \widetilde{W_j} \ast u + lambda (\sum_i W_{ij}) I ) ^{-1}

Initially the V matrix is initialized with Gaussian random numbers with mean zero and small standard deviation. Than U and V are updated until convergence. The attribute scheme must specify the scheme(uni, uo, io, co) to use.
BPR. In this implementation of BPR (alg = "BPR") is applied a stochastic gradient descent approach that randomly choose triples from D_R and trains the model \Theta. In this implementation the BPR optimization criterion is applied on matrix factorization. If R = U \times V^T, where U and V are the usual feature matrix cropped to k features, the parameter vector of the model is \Theta = \langle U,V \rangle. The Boolean randomInit parameter determines whatever the feature matrix are initialized to a random value or at a static 0.1 value. The algorithm will use three regularization terms, RegU for the user features U, RegI for positive updates and RegJ for negative updates of the item features V, lambda is the learning rate, autoConvergence is a toggle to the auto convergence validation, convergence upper limit to the convergence, and updateJ if true updates negative item features.
SlopeOne The Weighted Slope One (alg = "slopeOne") performs prediction for a missing rating \hat{r}_{ui} for user u on item i as the following average:

\hat{r}_{ui} = \frac{\sum_{\forall r_{uj}} (dev_{ij} + r_{uj})c_{ij}}{\sum_{\forall r_{uj}}c_{ij}}.

The average deviation rating $dev_ij$ between co-rated items is defined by:

dev_{ij} = \sum_{\forall u \in users }\frac{r_{ui} - r_{uj}}{c_{ij}}.

Where $c_ij$ is the number of co-ratings between items $i$ and $j$ and $r_ui$ is an existing rating for user $u$ on item $i$. The Weighted Slope One takes into account both, information from users who rated the same item and the number of observed ratings.

To view a full list of available algorithms and their default configuration execute rrecsysRegistry.

Value

Depending on the alg value it will be either an object of type SVDclass or IBclass.

References

D. Jannach, M. Zanker, A. Felfernig, and G. Friedrich. Recommender Systems: An Introduction. Cambridge University Press, New York, NY, USA, 1st edition, 2010. ISBN 978-0-521-49336-9.

Funk, S., 2006, Netflix Update: Try This at Home, http://sifter.org/~simon/journal/20061211.html.

Y. Koren, R. Bell, and C. Volinsky. Matrix Factorization Techniques for Recommender Systems. Computer, 42(8):30–37, Aug. 2009. ISSN 0018-9162. doi: 10.1109/MC.2009.263. http://dx.doi.org/10.1109/MC.2009.263.

R. Pan, Y. Zhou, B. Cao, N. Liu, R. Lukose, M. Scholz, and Q. Yang. One-Class Collaborative Filtering. In Data Mining, 2008. ICDM ’08. Eighth IEEE International Conference on, pages 502–511, Dec 2008. doi: 10.1109/ICDM.2008.16.

S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme. BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI ’09, pages 452–461, Arlington, Virginia, United States, 2009. AUAI Press. ISBN 978-0-9749039-5-8. URL http://dl.acm.org/citation.cfm?id=1795114.1795167.

Examples

myratings <- matrix(sample(c(0:5), size = 200, replace = TRUE, 
        prob = c(.6,.08,.08,.08,.08,.08)), nrow = 20, byrow = TRUE)

myratings <- defineData(myratings)

r <- rrecsys(myratings, alg = "funkSVD", k = 2)

r2 <- rrecsys(myratings, alg = "IBKNN", simFunct = "cos", neigh = 5)

rrecsysRegistry$get_entries()

[Package rrecsys version 0.9.7.3.1 Index]