workflowSlotting {distantia} | R Documentation |
Generates a composite sequence, constrained by sample order, from two sequences, by minimizing the dissimilarity between adjacent samples of each input sequence. The algorithm computes the distance matrix, least cost matrix, and least cost path of two sequences, and uses the least cost path file to find the slotting that better minimizes the dissimilarity between adjacent samples. The algorithm assumes that the samples are not aligned or paired.
workflowSlotting(
sequences = NULL,
grouping.column = NULL,
time.column = NULL,
exclude.columns = NULL,
method = "manhattan",
plot = TRUE
)
sequences |
dataframe with two sequences identified by a grouping column generated by |
grouping.column |
character string, name of the column in |
time.column |
character string, name of the column with time/depth/rank data. |
exclude.columns |
character string or character vector with column names in |
method |
character string naming a distance metric. Valid entries are: "manhattan", "euclidean", "chi", and "hellinger". Invalid entries will throw an error. |
plot |
boolean, if |
A dataframe with the same number of rows as sequences
, ordered according to the best solution found by the least-cost algorithm.
Blas Benito <blasbenito@gmail.com>
#loading the data
data(pollenGP)
#getting first 20 samples
pollenGP <- pollenGP[1:20, ]
#sampling indices
set.seed(10) #to get same result every time
sampling.indices <- sort(sample(1:20, 10))
#subsetting the sequence
A <- pollenGP[sampling.indices, ]
B <- pollenGP[-sampling.indices, ]
#preparing the sequences
AB <- prepareSequences(
sequence.A = A,
sequence.A.name = "A",
sequence.B = B,
sequence.B.name = "B",
grouping.column = "id",
exclude.columns = c("depth", "age"),
transformation = "hellinger"
)
AB.combined <- workflowSlotting(
sequences = AB,
grouping.column = "id",
time.column = "age",
exclude.columns = "depth",
method = "manhattan",
plot = TRUE
)
AB.combined