ser_dist {seriation} | R Documentation |
Dissimilarities and Correlations Between Seriation Orders
Description
Calculates dissimilarities/correlations between seriation orders in a list of type ser_permutation_vector.
Usage
ser_dist(x, y = NULL, method = "spearman", reverse = TRUE, ...)
ser_cor(x, y = NULL, method = "spearman", reverse = TRUE, test = FALSE)
ser_align(x, method = "spearman")
Arguments
x |
set of seriation orders as a list with elements which can be coerced into ser_permutation_vector objects. |
y |
if not |
method |
a character string with the name of the used measure.
Available measures are: |
reverse |
a logical indicating if the orders should also be checked in
reverse order and the best value (highest correlation, lowest distance) is
reported. This only affect ranking-based measures and not precedence
invariant measures (e.g., |
... |
Further arguments passed on to the method. |
test |
a logical indicating if a correlation test should be performed. |
Details
ser_cor()
calculates the correlation between two sequences (orders).
Note that a seriation order and its reverse are identical and purely an
artifact due to the method that creates the order. This is a major
difference to rankings. For ranking-based correlation measures (Spearman and
Kendall) the absolute value of the correlation is returned for
reverse = TRUE
(in effect returning the correlation for the reversed order). If
test = TRUE
then the appropriate test for association is performed
and a matrix with p-values is returned as the attribute "p-value"
.
Note that no correction for multiple testing is performed.
For ser_dist()
, the correlation coefficients (Kendall's tau and
Spearman's rho) are converted into a dissimilarity by taking one minus the
correlation value. Note that Manhattan distance between the ranks in a
linear order is equivalent to Spearman's footrule metric (Diaconis 1988).
reverse = TRUE
returns the pairwise minima using also reversed
orders.
The positional proximity coefficient (ppc) is a precedence invariant measure based on product of the squared positional distances in two permutations defined as (see Goulermas et al 2016):
d_{ppc}(R, S) = 1/h \sum_{j=2}^n \sum_{i=1}^{j-1}
(\pi_R(i)-\pi_R(j))^2 * (\pi_S(i)-\pi_S(j))^2,
where R
and S
are two seriation orders, pi_R
and
pi_S
are the associated permutation vectors and h
is a
normalization factor. The associated generalized correlation coefficient is
defined as 1-d_{ppc}
. For this precedence invariant measure
reverse
is ignored.
The absolute pairwise rank difference (aprd) is also precedence invariant and defined as a distance measure:
d_{aprd}(R, S) = \sum_{j=2}^n \sum_{i=1}^{j-1} | |\pi_R(i)-\pi_R(j)| -
|\pi_S(i)-\pi_S(j)| |^p,
where p
is the power which can be passed on as parameter p
and
is by default set to 2. For this precedence invariant measure reverse
is ignored.
ser_align()
tries to normalize the direction in a list of seriations
such that ranking-based methods can be used. We add for each permutation
also the reversed order to the set and then use a modified version of Prim's
algorithm for finding a minimum spanning tree (MST) to choose if the
original seriation order or its reverse should be used. We use the orders
first added to the MST. Every time an order is added, its reverse is removed
from the possible remaining orders.
Value
-
ser_dist()
returns an object of class stats::dist. -
ser_align()
returns a new list with elements of class ser_permutation.
Author(s)
Michael Hahsler
References
P. Diaconis (1988): Group Representations in Probability and Statistics, Institute of Mathematical Statistics, Hayward, CA.
J.Y. Goulermas, A. Kostopoulos, and T. Mu (2016): A New Measure for Analyzing and Fusing Sequences of Objects. IEEE Transactions on Pattern Analysis and Machine Intelligence 38(5):833-48. doi:10.1109/TPAMI.2015.2470671
See Also
Other permutation:
get_order()
,
permutation_vector2matrix()
,
permute()
,
ser_permutation()
,
ser_permutation_vector()
Examples
set.seed(1234)
## seriate dist of 50 flowers from the iris data set
data("iris")
x <- as.matrix(iris[-5])
x <- x[sample(1:nrow(x), 50), ]
rownames(x) <- 1:50
d <- dist(x)
## Create a list of different seriations
methods <- c("HC_complete", "OLO", "GW", "VAT",
"TSP", "Spectral", "MDS", "Identity", "Random")
os <- sapply(methods, function(m) {
cat("Doing", m, "... ")
tm <- system.time(o <- seriate(d, method = m))
cat("took", tm[3],"s.\n")
o
})
## Compare the methods using distances. Default is based on
## Spearman's rank correlation coefficient where reverse orders are
## also considered.
ds <- ser_dist(os)
hmap(ds, margin = c(7,7))
## Compare using correlation between orders. Reversed orders have
## negative correlation!
cs <- ser_cor(os, reverse = FALSE)
hmap(cs, margin = c(7,7))
## Compare orders by allowing orders to be reversed.
## Now all but random and identity are highly positive correlated
cs2 <- ser_cor(os, reverse = TRUE)
hmap(cs2, margin=c(7,7))
## A better approach is to align the direction of the orders first
## and then calculate correlation.
os_aligned <- ser_align(os)
cs3 <- ser_cor(os_aligned, reverse = FALSE)
hmap(cs3, margin = c(7,7))
## Compare the orders using clustering. We use Spearman's foot rule
## (Manhattan distance of ranks). In order to use rank-based method,
## we align the direction of the orders.
os_aligned <- ser_align(os)
ds <- ser_dist(os_aligned, method = "manhattan")
plot(hclust(ds))