R: Computes distance matrices among the samples of two or more...

distanceMatrix {distantia}

R Documentation

Computes distance matrices among the samples of two or more multivariate time-series.

Description

Computes distance matrices among the samples of two or more multivariate time-series provided in a single dataframe (generally produced by prepareSequences), identified by a grouping column (argument grouping.column). Distances can be computed with the methods "manhattan", "euclidean", "chi", and "hellinger", and are implemented in the function distance. The function uses the packages parallel, foreach, and doParallel to compute distances matrices among different sequences in parallel. It is configured to use all processors available minus one.

Usage

distanceMatrix(
  sequences = NULL,
  grouping.column = NULL,
  time.column = NULL,
  exclude.columns = NULL,
  method = "manhattan",
  parallel.execution = TRUE
  )

Arguments

`sequences`	dataframe with multiple sequences identified by a grouping column. Generally the ouput of `prepareSequences`.
`grouping.column`	character string, name of the column in `sequences` to be used to identify separates sequences within the file. This argument is ignored if `sequence.A` and `sequence.B` are provided.
`time.column`	character string, name of the column with time/depth/rank data. The data in this column is not modified.
`exclude.columns`	character string or character vector with column names in `sequences`, or `squence.A` and `sequence.B` to be excluded from the analysis.
`method`	character string naming a distance metric. Valid entries are: "manhattan", "euclidean", "chi", and "hellinger". Invalid entries will throw an error.
`parallel.execution`	boolean, if `TRUE` (default), execution is parallelized, and serialized if `FALSE`.

Details

Distances are computed as:

manhattan: d <- sum(abs(x - y))
euclidean: d <- sqrt(sum((x - y)^2))
chi: xy <- x + y y. <- y / sum(y) x. <- x / sum(x) d <- sqrt(sum(((x. - y.)^2) / (xy / sum(xy))))
hellinger: d <- sqrt(1/2 * sum(sqrt(x) - sqrt(y))^2)

Note that zeroes are replaced by 0.00001 whem method equals "chi" or "hellinger".

Value

A list with named slots containing the the distance matrices of every possible combination of sequences according to grouping.column.

Author(s)

Blas Benito <blasbenito@gmail.com>

Examples


#loading data
data(sequenceA)
data(sequenceB)

#preparing datasets
AB.sequences <- prepareSequences(
 sequence.A = sequenceA,
 sequence.A.name = "A",
 sequence.B = sequenceB,
 sequence.B.name = "B",
 merge.mode = "complete",
 if.empty.cases = "zero",
 transformation = "hellinger"
 )

#computing distance matrix
AB.distance.matrix <- distanceMatrix(
 sequences = AB.sequences,
 grouping.column = "id",
 method = "manhattan",
 parallel.execution = FALSE
 )


#plot
plotMatrix(distance.matrix = AB.distance.matrix)

[Package distantia version 1.0.2 Index]