leastCostMatrix {distantia}R Documentation

Computes a least cost matrix from a distance matrix.

Description

Computes the constrained (by sample order) minimum sum of distances among samples in two multivariate time-series by finding the least cost path between the first and last samples in a distance matrix computed by distanceMatrix. The minimum distance is found trough an efficient dynamic programming algorithm that first solves the local least cost path between adjacent samples, and uses the partial solutions to find the global solution.

The algorithm is based on the sequence slotting algorithm described by Birks and Gordon (1985). In its original version, the algorithm searches for the least cost path between a given sample of one sequence (A) and the samples of the other sequence (B) in orthogonal directions (either one step in the x axis or one step in the y axis), which allows to locate the two samples in B between which the target sample in A "belongs" (has the least distance to). Therefore, the algorithm is in fact ordering the samples in both sequences to virtually create a single sequence (as in B1, A1, A2, B2, etc) with the samples ordered in the way that minimizes the global distance among them.

This function provides an additional option that allows to include the diagonals in the search of the least cost path through the diagonal argument (which is FALSE by default). This modification allows to find, for each sample in A, the most similar sample in B, and align them together, if the distance among them is lower than the one found in the orthogonal directions. Both options give highly correlated least cost distances for the same matrices, but have different applications.

Usage

leastCostMatrix(
  distance.matrix = NULL,
  diagonal = FALSE,
  parallel.execution = TRUE
  )

Arguments

distance.matrix

numeric matrix or list of numeric matrices, a distance matrix produced by distanceMatrix.

diagonal

boolean, if TRUE, diagonals are included in the computation of the least cost path. Defaults to FALSE, as the original algorithm did not include diagonals in the computation of the least cost path.

parallel.execution

boolean, if TRUE (default), execution is parallelized, and serialized if FALSE.

Value

A list of matrices with the same dimensions as distance.matrix with the cumulative least cost among samples. The value of the lower-right cell (in the actual data matrix, not in the plotted version!) represents the sum of the least cost path across all samples.

Examples




#loading data
data(sequenceA)
data(sequenceB)

#preparing datasets
AB.sequences <- prepareSequences(
 sequence.A = sequenceA,
 sequence.A.name = "A",
 sequence.B = sequenceB,
 sequence.B.name = "B",
 merge.mode = "complete",
 if.empty.cases = "zero",
 transformation = "hellinger"
 )

#computing distance matrix
AB.distance.matrix <- distanceMatrix(
 sequences = AB.sequences,
 grouping.column = "id",
 method = "manhattan",
 parallel.execution = FALSE
 )

#computing least cost matrix
AB.least.cost.matrix <- leastCostMatrix(
 distance.matrix = AB.distance.matrix,
 diagonal = FALSE,
 parallel.execution = FALSE
 )

#plot
par(mfrow=c(1,2))
plotMatrix(distance.matrix = AB.distance.matrix)
plotMatrix(distance.matrix = AB.least.cost.matrix)
dev.off()




[Package distantia version 1.0.2 Index]