distanceMatrix {distantia} | R Documentation |
Computes distance matrices among the samples of two or more multivariate time-series.
Description
Computes distance matrices among the samples of two or more multivariate time-series provided in a single dataframe (generally produced by prepareSequences
), identified by a grouping column (argument grouping.column
). Distances can be computed with the methods "manhattan", "euclidean", "chi", and "hellinger", and are implemented in the function distance
. The function uses the packages parallel
, foreach
, and doParallel
to compute distances matrices among different sequences in parallel. It is configured to use all processors available minus one.
Usage
distanceMatrix(
sequences = NULL,
grouping.column = NULL,
time.column = NULL,
exclude.columns = NULL,
method = "manhattan",
parallel.execution = TRUE
)
Arguments
sequences |
dataframe with multiple sequences identified by a grouping column. Generally the ouput of |
grouping.column |
character string, name of the column in |
time.column |
character string, name of the column with time/depth/rank data. The data in this column is not modified. |
exclude.columns |
character string or character vector with column names in |
method |
character string naming a distance metric. Valid entries are: "manhattan", "euclidean", "chi", and "hellinger". Invalid entries will throw an error. |
parallel.execution |
boolean, if |
Details
Distances are computed as:
-
manhattan
:d <- sum(abs(x - y))
-
euclidean
:d <- sqrt(sum((x - y)^2))
-
chi
:xy <- x + y y. <- y / sum(y) x. <- x / sum(x) d <- sqrt(sum(((x. - y.)^2) / (xy / sum(xy))))
-
hellinger
:d <- sqrt(1/2 * sum(sqrt(x) - sqrt(y))^2)
Note that zeroes are replaced by 0.00001 whem method
equals "chi" or "hellinger".
Value
A list with named slots containing the the distance matrices of every possible combination of sequences according to grouping.column
.
Author(s)
Blas Benito <blasbenito@gmail.com>
See Also
Examples
#loading data
data(sequenceA)
data(sequenceB)
#preparing datasets
AB.sequences <- prepareSequences(
sequence.A = sequenceA,
sequence.A.name = "A",
sequence.B = sequenceB,
sequence.B.name = "B",
merge.mode = "complete",
if.empty.cases = "zero",
transformation = "hellinger"
)
#computing distance matrix
AB.distance.matrix <- distanceMatrix(
sequences = AB.sequences,
grouping.column = "id",
method = "manhattan",
parallel.execution = FALSE
)
#plot
plotMatrix(distance.matrix = AB.distance.matrix)