workflowNullPsi {distantia}R Documentation

Computes the dissimilarity measure psi on restricted permutations of two or more sequences.


The function first computes psi on the observed sequences, and then computes it on permutations of the input sequences by the repetitions argument. The data is randomized as follows: within each column, each data-point can be: 1) left as is; 2) replaced by the previous case; 3) replaced by the next case. The action applied to each data-point is selected randomly, and independently from the actions applied to other data-points. This type of randomization generates versions of the dataset that have the same general structure as the original one, but small local and independent changes only ocurring within the immediate neighborhood (one row up or down) of each case in the table. The method should generate very conservative random values of psi.


  sequences = NULL,
  grouping.column = NULL,
  time.column = NULL,
  exclude.columns = NULL,
  method = "manhattan",
  diagonal = FALSE,
  paired.samples = FALSE,
  same.time = FALSE,
  ignore.blocks = FALSE,
  parallel.execution = TRUE,
  repetitions = 9



dataframe with multiple sequences identified by a grouping column generated by prepareSequences.


character string, name of the column in sequences to be used to identify separates sequences within the file.


character string, name of the column with time/depth/rank data.


character string or character vector with column names in sequences to be excluded from the analysis.


character string naming a distance metric. Valid entries are: "manhattan", "euclidean", "chi", and "hellinger". Invalid entries will throw an error.


boolean, if TRUE, diagonals are included in the computation of the least cost path. Defaults to FALSE, as the original algorithm did not include diagonals in the computation of the least cost path. If paired.samples is TRUE, then diagonal is irrelevant.


boolean, if TRUE, the sequences are assumed to be aligned, and distances are computed for paired-samples only (no distance matrix required). Default value is FALSE.


boolean. If TRUE, samples in the sequences to compare will be tested to check if they have the same time/age/depth according to time.column. This argument is only useful when the user needs to compare two sequences taken at different sites but same time frames.


boolean. If TRUE, the function leastCostPathNoBlocks analyzes the least-cost path of the best solution, and removes blocks (straight-orthogonal sections of the least-cost path), which happen in highly dissimilar sections of the sequences, and inflate output psi values.


boolean, if TRUE (default), execution is parallelized, and serialized if FALSE.


integer, number of null psi values to obtain.


A list with two slots:


Blas Benito <>


#load data

#prepare sequences
MIS.sequences <- prepareSequences(
  sequences = sequencesMIS,
  grouping.column = "MIS",
  transformation = "hellinger"

#execute workflow to compute psi
MIS.null.psi <- workflowNullPsi(
 sequences = MIS.sequences[MIS.sequences$MIS %in% c("MIS-1", "MIS-2"), ],
 grouping.column = "MIS",
 method = "manhattan",
 repetitions = 3,
 parallel.execution = FALSE


[Package distantia version 1.0.2 Index]