metrics {EnvNJ}R Documentation

Pairwise Vector Dissimilarities


Computes the dissimilarity between n-dimensional vectors.


metrics(vset, method = 'euclidean', p = 2)



matrix (n x m) where each column is a n-dimensional vector.


a character string indicating the distance/dissimilarity method to be used (see details).


power of the Minkowski distance. This parameter is only relevant if the method 'minkowski' has been selected.


Although many of the offered methods compute a proper distance, that is not always the case. For instance, for a non null vector, v, the 'cosine' method gives d(v, 2v) = 0, violating the coincidence axiom. For that reason we prefer to use the term dissimilarity instead of distance. The methods offered can be grouped into families.

L_p family:

('euclidean', 'manhattan', 'minkowski', 'chebyshev')

Euclidean = sqrt( sum | P_i - Q_i |^2)

Manhattan = sum | P_i - Q_i |

Minkowski = ( sum| P_i - Q_i |^p)^1/p

Chebyshev = max | P_i - Q_i |

L_1 family:

('sorensen', 'soergel', 'lorentzian', 'kulczynski', 'canberra')

Sorensen = sum | P_i - Q_i | / sum (P_i + Q_i)

Soergel = sum | P_i - Q_i | / sum max(P_i , Q_i)

Lorentzian = sum ln(1 + | P_i - Q_i |)

Kulczynski = sum | P_i - Q_i | / sum min(P_i , Q_i)

Canberra = sum | P_i - Q_i | / (P_i + Q_i)

Intersection family:

('non-intersection', 'wavehedges', 'czekanowski', 'motyka')

Non-intersection = 1 - sum min(P_i , Q_i)

Wave-Hedges = sum | P_i - Q_i | / max(P_i , Q_i)

Czekanowski = sum | P_i - Q_i | / sum | P_i + Q_i |

Motyka = sum max(P_i , Q_i) / sum (P_i , Q_i)

Inner product family:

('cosine', 'jaccard')

Cosine = - ln(0.5 (1 + (P_i Q_i) / sqrt(sum P_i^2) sqrt(sum Q_i^2)))

Jaccard = 1 - sum (P_i Q_i) / (sum P_i^2 + sum Q_i^2 - sum (P_i Q_i))

Squared-chord family:

('bhattacharyya', 'squared_chord')

Bhattacharyya = - ln sum sqrt(P_i Q_i)

Squared-chord = sum ( sqrt(P_i) - sqrt(Q_i) )^2

Squared Chi family:


Squared-Chi = sum ( (P_i - Q_i )^2 / (P_i + Q_i) )

Shannon's entropy family:

('kullback-leibler', 'jeffreys', 'jensen-shannon', 'jensen_difference')

Kullback-Leibler = sum P_i * log(P_i / Q_i)

Jeffreys = sum (P_i - Q_i) * log(P_i / Q_i)

Jensen-Shannon = 0.5(sum P_i ln(2P_i / (P_i + Q_i)) + sum Q_i ln(2Q_i / (P_i + Q_i)))

Jensen difference = sum (0.5(P_i log(P_i) + Q_i log(Q_i)) - 0.5(P_i + Q_i) ln(0.5(P_i + Q_i))

Mismatch family:

('hamming', 'mismatch', 'mismatchZero', 'binary')

Hamming = (# coordinates where P_i != Q_i) / n

Mismatch = # coordinates where that P_i != Q_i

MismatchZero = Same as mismatch but after removing the coordinates where both vectors have zero.

Binary = (# coordinates where a vector has 0 and the other has a non-zero value) / n.

Combinations family:

('taneja', 'kumar-johnson', 'avg')

Taneja = sum ( P_i + Q_i / 2) log( P_i + Q_i / ( 2 sqrt( P_i * Q_i)) )

Kumar-Johnson = sum (P_i^2 - Q_i^2)^2 / 2 (P_i Q_i)^1.5

Avg = 0.5 (sum | P_i - Q_i| + max | P_i - Q_i |)


A matrix with the computed dissimilarity values.


Sung-Hyuk Cha (2007). International Journal of Mathematical Models and Methods in Applied Sciences. Issue 4, vol. 1

Luczac et al. (2019). Briefings in Bioinformatics 20: 1222-1237.

See Also

vcos(), vdis()


metrics(matrix(1:9, ncol =3), 'cosine')

[Package EnvNJ version 0.1.3 Index]