distancePairedSamples {distantia}R Documentation

Computes distance among pairs of aligned samples in two or more multivariate time-series.

Description

Computes the distance (one of: "manhattan", "euclidean", "chi", or "hellinger") between pairs of aligned samples (same order/depth/age) in two or more multivariate time-series.

Usage

distancePairedSamples(
  sequences = NULL,
  grouping.column = NULL,
  time.column = NULL,
  exclude.columns = NULL,
  same.time = FALSE,
  method = "manhattan",
  sum.distances = FALSE,
  parallel.execution = TRUE
  )

Arguments

sequences

dataframe with multiple sequences identified by a grouping column. Generally the ouput of prepareSequences.

grouping.column

character string, name of the column in sequences to be used to identify separates sequences within the file. This argument is ignored if sequence.A and sequence.B are provided.

time.column

character string, name of the column with time/depth/rank data. The data in this column is not modified.

exclude.columns

character string or character vector with column names in sequences, or squence.A and sequence.B to be excluded from the analysis.

same.time

boolean. If TRUE, samples in the sequences to compare will be tested to check if they have the same time/age/depth according to time.column. This argument is only useful when the user needs to compare two sequences taken at different sites but same time frames.

method

character string naming a distance metric. Valid entries are: "manhattan", "euclidean", "chi", and "hellinger". Invalid entries will throw an error.

sum.distances

boolean, if TRUE (default option), the distances between samples are summed, and the output of the function (now a list with a single number on each slot) can be directly used as input for the argument least.cost in the function psi.

parallel.execution

boolean, if TRUE (default), execution is parallelized, and serialized if FALSE.

Details

Distances are computed as:

Note that zeroes are replaced by 0.00001 whem method equals "chi" or "hellinger".

Value

A list with named slots (names of the sequences separated by a vertical line, as in "A|B") containing numeric vectors with the distance between paired samples of every possible combination of sequences according to grouping.column.

Author(s)

Blas Benito <blasbenito@gmail.com>

See Also

distance

Examples



#loading data
data(climate)

#preparing sequences
#notice the argument paired.samples
climate.prepared <- prepareSequences(
  sequences = climate,
  grouping.column = "sequenceId",
  time.column = "time",
  paired.samples = TRUE
  )

#compute pairwise distances between paired samples
climate.prepared.distances <- distancePairedSamples(
  sequences = climate.prepared,
  grouping.column = "sequenceId",
  time.column = "time",
  exclude.columns = NULL,
  method = "manhattan",
  sum.distances = FALSE,
  parallel.execution = FALSE
  )



[Package distantia version 1.0.2 Index]