R: Computes distance among pairs of aligned samples in two or...

distancePairedSamples {distantia}

R Documentation

Computes distance among pairs of aligned samples in two or more multivariate time-series.

Description

Computes the distance (one of: "manhattan", "euclidean", "chi", or "hellinger") between pairs of aligned samples (same order/depth/age) in two or more multivariate time-series.

Usage

distancePairedSamples(
  sequences = NULL,
  grouping.column = NULL,
  time.column = NULL,
  exclude.columns = NULL,
  same.time = FALSE,
  method = "manhattan",
  sum.distances = FALSE,
  parallel.execution = TRUE
  )

Arguments

`sequences`	dataframe with multiple sequences identified by a grouping column. Generally the ouput of `prepareSequences`.
`grouping.column`	character string, name of the column in `sequences` to be used to identify separates sequences within the file. This argument is ignored if `sequence.A` and `sequence.B` are provided.
`time.column`	character string, name of the column with time/depth/rank data. The data in this column is not modified.
`exclude.columns`	character string or character vector with column names in `sequences`, or `squence.A` and `sequence.B` to be excluded from the analysis.
`same.time`	boolean. If `TRUE`, samples in the sequences to compare will be tested to check if they have the same time/age/depth according to `time.column`. This argument is only useful when the user needs to compare two sequences taken at different sites but same time frames.
`method`	character string naming a distance metric. Valid entries are: "manhattan", "euclidean", "chi", and "hellinger". Invalid entries will throw an error.
`sum.distances`	boolean, if `TRUE` (default option), the distances between samples are summed, and the output of the function (now a list with a single number on each slot) can be directly used as input for the argument `least.cost` in the function `psi`.
`parallel.execution`	boolean, if `TRUE` (default), execution is parallelized, and serialized if `FALSE`.

Details

Distances are computed as:

manhattan: d <- sum(abs(x - y))
euclidean: d <- sqrt(sum((x - y)^2))
chi: xy <- x + y y. <- y / sum(y) x. <- x / sum(x) d <- sqrt(sum(((x. - y.)^2) / (xy / sum(xy))))
hellinger: d <- sqrt(1/2 * sum(sqrt(x) - sqrt(y))^2)

Note that zeroes are replaced by 0.00001 whem method equals "chi" or "hellinger".

Value

A list with named slots (names of the sequences separated by a vertical line, as in "A|B") containing numeric vectors with the distance between paired samples of every possible combination of sequences according to grouping.column.

Author(s)

Blas Benito <blasbenito@gmail.com>

Examples



#loading data
data(climate)

#preparing sequences
#notice the argument paired.samples
climate.prepared <- prepareSequences(
  sequences = climate,
  grouping.column = "sequenceId",
  time.column = "time",
  paired.samples = TRUE
  )

#compute pairwise distances between paired samples
climate.prepared.distances <- distancePairedSamples(
  sequences = climate.prepared,
  grouping.column = "sequenceId",
  time.column = "time",
  exclude.columns = NULL,
  method = "manhattan",
  sum.distances = FALSE,
  parallel.execution = FALSE
  )

[Package distantia version 1.0.2 Index]