R: Jaccard Similarity Index

jaccard {mlr3measures}

R Documentation

Jaccard Similarity Index

Description

Measure to compare two or more sets w.r.t. their similarity.

Usage

jaccard(sets, na_value = NaN, ...)

Arguments

`sets`	(`list()`) List of character or integer vectors. `sets` must have at least 2 elements.
`na_value`	(`numeric(1)`) Value that should be returned if the measure is not defined for the input (as described in the note). Default is `NaN`.
`...`	(`any`) Additional arguments. Currently ignored.

Details

For two sets A and B, the Jaccard Index is defined as

J(A, B) = \frac{|A \cap B|}{|A \cup B|}.

If more than two sets are provided, the mean of all pairwise scores is calculated.

This measure is undefined if two or more sets are empty.

Value

Performance value as numeric(1).

Meta Information

Type: "similarity"
Range: [0, 1]
Minimize: FALSE

References

Jaccard, Paul (1901). “Étude comparative de la distribution florale dans une portion des Alpes et du Jura.” Bulletin de la Société Vaudoise des Sciences Naturelles, 37, 547-579. doi:10.5169/SEALS-266450.

Bommert A, Rahnenführer J, Lang M (2017). “A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data.” Computational and Mathematical Methods in Medicine, 2017, 1–18. doi:10.1155/2017/7907163.

Bommert A, Lang M (2021). “stabm: Stability Measures for Feature Selection.” Journal of Open Source Software, 6(59), 3010. doi:10.21105/joss.03010.

Examples

set.seed(1)
sets = list(
  sample(letters[1:3], 1),
  sample(letters[1:3], 2)
)
jaccard(sets)

[Package mlr3measures version 0.6.0 Index]