dissimilarity {recommenderlab} | R Documentation |
Dissimilarity and Similarity Calculation Between Rating Data
Description
Calculate dissimilarities/similarities between ratings by users and for items.
Usage
## S4 method for signature 'binaryRatingMatrix'
dissimilarity(x, y = NULL, method = NULL, args = NULL, which = "users")
## S4 method for signature 'realRatingMatrix'
dissimilarity(x, y = NULL, method = NULL, args = NULL, which = "users")
similarity(x, y = NULL, method = NULL, args = NULL, ...)
## S4 method for signature 'ratingMatrix'
similarity(x, y = NULL, method = NULL, args = NULL, which = "users",
min_matching = 0, min_predictive = 0)
Arguments
x |
a ratingMatrix. |
y |
|
method |
(dis)similarity measure to use. Available measures
are typically |
args |
a list of additional arguments for the methods. |
which |
a character string indicating if the (dis)similarity should be
calculated between |
min_matching , min_predictive |
Thresholds on the minimum number of ratings used to calculate the similarity and the minimum number of ratings that can be used for prediction. |
... |
further arguments. |
Details
Most dissimlarites and similarities are calculated using the proxy package.
Similarities are typically converted into dissimilarities using s = 1 / (1 + d)
or s = 1 - d
(used for Jaccard, Cosine and Pearson correlation) depending on the measure.
Similarities are usually defined in the range of [0, 1]
, however,
Cosine similarity and Pearson correlation are defined in the interval [-1, 1]
. We rescale these
measures with s' = 1 / 2 (s + 1)
to the interval [0, 1]
.
Similarities are calculated using only the ratings that are available for both
users/items. This can lead to calculating the measure using only a very small number (maybe only one)
of ratings. min_matching
is the required number of shared ratings to calculate similarities.
To predict ratings, there need to be additional ratings in argument y
.
min_predictive
is the required number of additional ratings to calculate similarities. If
min_matching
or min_predictive
fails, then NA
is reported instead of the calculated similarity.
Value
returns an object of class "dist"
, "simil"
or an appropriate object (e.g.,
a matrix with class "crossdist"
o "crosssimil"
) to represent
a cross-(dis)similarity.
See Also
ratingMatrix
,
dissimilarity
in arules, and
dist
in proxy.
Examples
data(MSWeb)
## between 5 users
dissimilarity(MSWeb[1:5,], method = "jaccard")
similarity(MSWeb[1:5,], method = "jaccard")
## between first 3 items
dissimilarity(MSWeb[,1:3], method = "jaccard", which = "items")
similarity(MSWeb[,1:3], method = "jaccard", which = "items")
## cross-similarity between first 2 users and users 10-20
similarity(MSWeb[1:2,], MSWeb[10:20,], method="jaccard")