metrics {EnvNJ} | R Documentation |
Pairwise Vector Dissimilarities
Description
Computes the dissimilarity between n-dimensional vectors.
Usage
metrics(vset, method = 'euclidean', p = 2)
Arguments
vset |
matrix (n x m) where each column is a n-dimensional vector. |
method |
a character string indicating the distance/dissimilarity method to be used (see details). |
p |
power of the Minkowski distance. This parameter is only relevant if the method 'minkowski' has been selected. |
Details
Although many of the offered methods compute a proper distance, that is not always the case. For instance, for a non null vector, v, the 'cosine' method gives d(v, 2v) = 0, violating the coincidence axiom. For that reason we prefer to use the term dissimilarity instead of distance. The methods offered can be grouped into families.
L_p family:
('euclidean', 'manhattan', 'minkowski', 'chebyshev')
Euclidean = sqrt( sum | P_i - Q_i |^2)
Manhattan = sum | P_i - Q_i |
Minkowski = ( sum| P_i - Q_i |^p)^1/p
Chebyshev = max | P_i - Q_i |
L_1 family:
('sorensen', 'soergel', 'lorentzian', 'kulczynski', 'canberra')
Sorensen = sum | P_i - Q_i | / sum (P_i + Q_i)
Soergel = sum | P_i - Q_i | / sum max(P_i , Q_i)
Lorentzian = sum ln(1 + | P_i - Q_i |)
Kulczynski = sum | P_i - Q_i | / sum min(P_i , Q_i)
Canberra = sum | P_i - Q_i | / (P_i + Q_i)
Intersection family:
('non-intersection', 'wavehedges', 'czekanowski', 'motyka')
Non-intersection = 1 - sum min(P_i , Q_i)
Wave-Hedges = sum | P_i - Q_i | / max(P_i , Q_i)
Czekanowski = sum | P_i - Q_i | / sum | P_i + Q_i |
Motyka = sum max(P_i , Q_i) / sum (P_i , Q_i)
Inner product family:
('cosine', 'jaccard')
Cosine = - ln(0.5 (1 + (P_i Q_i) / sqrt(sum P_i^2) sqrt(sum Q_i^2)))
Jaccard = 1 - sum (P_i Q_i) / (sum P_i^2 + sum Q_i^2 - sum (P_i Q_i))
Squared-chord family:
('bhattacharyya', 'squared_chord')
Bhattacharyya = - ln sum sqrt(P_i Q_i)
Squared-chord = sum ( sqrt(P_i) - sqrt(Q_i) )^2
Squared Chi family:
('squared_chi')
Squared-Chi = sum ( (P_i - Q_i )^2 / (P_i + Q_i) )
Shannon's entropy family:
('kullback-leibler', 'jeffreys', 'jensen-shannon', 'jensen_difference')
Kullback-Leibler = sum P_i * log(P_i / Q_i)
Jeffreys = sum (P_i - Q_i) * log(P_i / Q_i)
Jensen-Shannon = 0.5(sum P_i ln(2P_i / (P_i + Q_i)) + sum Q_i ln(2Q_i / (P_i + Q_i)))
Jensen difference = sum (0.5(P_i log(P_i) + Q_i log(Q_i)) - 0.5(P_i + Q_i) ln(0.5(P_i + Q_i))
Mismatch family:
('hamming', 'mismatch', 'mismatchZero', 'binary')
Hamming = (# coordinates where P_i != Q_i) / n
Mismatch = # coordinates where that P_i != Q_i
MismatchZero = Same as mismatch but after removing the coordinates where both vectors have zero.
Binary = (# coordinates where a vector has 0 and the other has a non-zero value) / n.
Combinations family:
('taneja', 'kumar-johnson', 'avg')
Taneja = sum ( P_i + Q_i / 2) log( P_i + Q_i / ( 2 sqrt( P_i * Q_i)) )
Kumar-Johnson = sum (P_i^2 - Q_i^2)^2 / 2 (P_i Q_i)^1.5
Avg = 0.5 (sum | P_i - Q_i| + max | P_i - Q_i |)
Value
A matrix with the computed dissimilarity values.
References
Sung-Hyuk Cha (2007). International Journal of Mathematical Models and Methods in Applied Sciences. Issue 4, vol. 1
Luczac et al. (2019). Briefings in Bioinformatics 20: 1222-1237.
https://r-snippets.readthedocs.io/en/latest/real_analysis/metrics.html
See Also
vcos(), vdis()
Examples
metrics(matrix(1:9, ncol =3), 'cosine')