dissimilarity {relations}R Documentation

Dissimilarity Between Relations

Description

Compute the dissimilarity between (ensembles of) relations.

Usage

relation_dissimilarity(x, y = NULL, method = "symdiff", ...)

Arguments

x

an ensemble of relations (see relation_ensemble()), or something which can be coerced to such.

y

NULL (default), or as for x.

method

a character string specifying one of the built-in methods for computing dissimilarity, or a function to be taken as a user-defined method. If a character string, its lower-cased version is matched against the lower-cased names of the available built-in methods using pmatch(). See Details for available built-in methods.

...

further arguments to be passed to methods.

Details

Available built-in methods are as follows.

"symdiff"

symmetric difference distance. This computes the cardinality of the symmetric difference of two relations, i.e., the number of tuples contained in exactly one of two relations. For preference relations, this coincides with the Kemeny-Snell metric (Kemeny and Snell, 1962). For linear orders, it gives Kendall's τ\tau metric (Diaconis, 1988).

Can also be referred to as "SD".

Only applicable to crisp relations.

"manhattan"

the Manhattan distance between the incidences.

"euclidean"

the Euclidean distance between the incidences.

"CS"

Cook-Seiford distance, a generalization of the distance function of Cook and Seiford (1978). Let the generalized ranks of an object aa in the (first) domain of an endorelation RR be defined as the number of objects bb dominating aa (i.e., for which aRba R b and not bRab R a), plus half the number of objects bb equivalent to aa (i.e., for which aRba R b and bRab R a). For preference relations, this gives the usual Kendall ranks arranged according to decreasing preference (and averaged for ties). Then the generalized Cook-Seiford distance is defined as the l1l_1 distance between the generalized ranks. For linear orders, this gives Spearman's footrule metric (Diaconis, 1988).

Only applicable to crisp endorelations.

"CKS"

Cook-Kress-Seiford distance, a generalization of the distance function of Cook, Kress and Seiford (1986). For each pair of objects aa and bb in an endorelation RR, we can have aRba R b and not bRab R a or vice versa (cases of “strict preference”), aRba R b and bRab R a (the case of “indifference”), or neither aRba R b nor bRab R a (the case of “incomparability”). (Only the last two are possible if a=ba = b.) The distance by Cook, Kress and Seiford puts indifference as the metric centroid between both preference cases and incomparability (i.e., indifference is at distance one from the other three, and each of the other three is at distance two from the others). The generalized Cook-Kress-Seiford distance is the paired comparison distance (i.e., a metric) based on these distances between the four paired comparison cases. (Formula 3 in the reference must be slightly modified for the generalization from partial rankings to arbitrary endorelations.)

Only applicable to crisp endorelations.

"score"

score-based distance. This computes Δ(s(x),s(y))\Delta(s(x), s(y)) for suitable score and distance functions ss and Δ\Delta, respectively. These can be specified by additional arguments score and Delta. If score is a character string, it is taken as the method for relation_scores. Otherwise, if given it must be a function giving the score function itself. If Delta is a number p1p \ge 1, the usual lpl_p distance is used. Otherwise, it must be a function giving the distance function. The defaults correspond to using the default relation scores and p=1p = 1, which for linear orders gives Spearman's footrule distance.

Only applicable to endorelations.

"Jaccard"

Jaccard distance: 1 minus the ratio of the cardinalities of the intersection and the union of the relations.

"PC"

(generalized) paired comparison distance. This generalizes the symdiff and CKS distances to use a general set of discrepancies δkl\delta_{kl} between the possible paired comparison results with a,ba,b/b,ab,a incidences 0/0, 1/0, 0/1, and 1/1 numbered from 1 to 4 (in a preference context with a \le encoding, these correspond to incompatibility, strict << and >> preference, and indifference), with δkl\delta_{kl} the discrepancy between possible results kk and ll. The distance is then obtained as the sum of the discrepancies from the paired comparisons of distinct objects, plus half the sum of discrepancies from the comparisons of identical objects (for which the only possible results are incomparability and indifference). The distance is a metric provided that the δkl\delta_{kl} satisfy the metric conditions (non-negativity and zero iff k=lk = l, symmetry and sub-additivity).

The discrepancies can be specified via the additional argument delta, either as a numeric vector of length 6 with the non-redundant values δ21,δ31,δ41,δ32,δ42,δ43\delta_{21}, \delta_{31}, \delta_{41}, \delta_{32}, \delta_{42}, \delta_{43}, or as a character string partially matching one of the following built-in discrepancies with corresponding parameter vector δ\delta:

"symdiff"

symmetric difference distance, with discrepancy between distinct results two between either opposite strict preferences or indifference and incomparability, and one otherwise: δ=(1,1,2,2,1,1)\delta = (1, 1, 2, 2, 1, 1) (default).

Can also be referred to as "SD".

"CKS"

Cook-Kress-Seiford distance, see above: δ=(2,2,1,2,1,1)\delta = (2, 2, 1, 2, 1, 1).

"EM"

the distance obtained from the generalization of the Kemeny-Snell distance for complete rankings to partial rankings introduced in Emond and Mason (2000). This uses a discrepancy of two for opposite strict preferences, and one for all other distinct results: δ=(1,1,1,2,1,1)\delta = (1, 1, 1, 2, 1, 1).

"JMB"

the distance with parameters as suggested by Jabeur, Martel and Ben Khélifa (2004): δ=(4/3,4/3,4/3,5/3,1,1)\delta = (4/3, 4/3, 4/3, 5/3, 1, 1).

"discrete"

the discrete metric on the set of paired comparison results: δ=(1,1,1,1,1,1)\delta = (1, 1, 1, 1, 1, 1).

Only applicable to crisp endorelations.

Methods "symdiff", "manhattan", "euclidean" and "Jaccard" take an additional logical argument na.rm: if true (default: false), tuples with missing memberships are excluded in the dissimilarity computations.

Value

If y is NULL, an object of class dist containing the dissimilarities between all pairs of elements of x. Otherwise, a matrix with the dissimilarities between the elements of x and the elements of y.

References

W. D. Cook, M. Kress and L. M. Seiford (1986), Information and preference in partial orders: a bimatrix representation. Psychometrika 51/2, 197–207. doi:10.1007/BF02293980.

W. D. Cook and L. M. Seiford (1978), Priority ranking and consensus formation. Management Science, 24/16, 1721–1732. doi:10.1287/mnsc.24.16.1721.

P. Diaconis (1988), Group Representations in Probability and Statistics. Institute of Mathematical Statistics: Hayward, CA.

E. J. Emond and D. W. Mason (2000), A new technique for high level decision support. Technical Report ORD Project Report PR2000/13, Operational Research Division, Department of National Defence, Canada.

K. Jabeur, J.-M. Martel and S. Ben Khélifa (2004). A distance-based collective preorder integrating the relative importance of the groups members. Group Decision and Negotiation, 13, 327–349. doi:10.1023/B:GRUP.0000042894.00775.75.

J. G. Kemeny and J. L. Snell (1962), Mathematical Models in the Social Sciences, chapter “Preference Rankings: An Axiomatic Approach”. MIT Press: Cambridge.


[Package relations version 0.6-13 Index]