distancePairedSamples {distantia} | R Documentation |
Computes the distance (one of: "manhattan", "euclidean", "chi", or "hellinger") between pairs of aligned samples (same order/depth/age) in two or more multivariate time-series.
distancePairedSamples(
sequences = NULL,
grouping.column = NULL,
time.column = NULL,
exclude.columns = NULL,
same.time = FALSE,
method = "manhattan",
sum.distances = FALSE,
parallel.execution = TRUE
)
sequences |
dataframe with multiple sequences identified by a grouping column. Generally the ouput of |
grouping.column |
character string, name of the column in |
time.column |
character string, name of the column with time/depth/rank data. The data in this column is not modified. |
exclude.columns |
character string or character vector with column names in |
same.time |
boolean. If |
method |
character string naming a distance metric. Valid entries are: "manhattan", "euclidean", "chi", and "hellinger". Invalid entries will throw an error. |
sum.distances |
boolean, if |
parallel.execution |
boolean, if |
Distances are computed as:
manhattan
: d <- sum(abs(x - y))
euclidean
: d <- sqrt(sum((x - y)^2))
chi
:
xy <- x + y
y. <- y / sum(y)
x. <- x / sum(x)
d <- sqrt(sum(((x. - y.)^2) / (xy / sum(xy))))
hellinger
: d <- sqrt(1/2 * sum(sqrt(x) - sqrt(y))^2)
Note that zeroes are replaced by 0.00001 whem method
equals "chi" or "hellinger".
A list with named slots (names of the sequences separated by a vertical line, as in "A|B") containing numeric vectors with the distance between paired samples of every possible combination of sequences according to grouping.column
.
Blas Benito <blasbenito@gmail.com>
#loading data
data(climate)
#preparing sequences
#notice the argument paired.samples
climate.prepared <- prepareSequences(
sequences = climate,
grouping.column = "sequenceId",
time.column = "time",
paired.samples = TRUE
)
#compute pairwise distances between paired samples
climate.prepared.distances <- distancePairedSamples(
sequences = climate.prepared,
grouping.column = "sequenceId",
time.column = "time",
exclude.columns = NULL,
method = "manhattan",
sum.distances = FALSE,
parallel.execution = FALSE
)