R: Computes the dissimilarity measure _psi

workflowNullPsiHP {distantia}

R Documentation

Computes the dissimilarity measure psi on restricted permutations of two or more sequences. High performance version with limited options

Description

The function first computes psi on the observed sequences, and then computes it on permutations of the input sequences by the repetitions argument. The data is randomized as follows: within each column, each data-point can be: 1) left as is; 2) replaced by the previous case; 3) replaced by the next case. The action applied to each data-point is selected randomly, and independently from the actions applied to other data-points. This type of randomization generates versions of the dataset that have the same general structure as the original one, but small local and independent changes only ocurring within the immediate neighborhood (one row up or down) of each case in the table. The method should generate very conservative random values of psi.

Usage

workflowNullPsiHP(
  sequences = NULL,
  grouping.column = NULL,
  time.column = NULL,
  exclude.columns = NULL,
  parallel.execution = TRUE,
  repetitions = 9
  )

Arguments

`sequences`	dataframe with multiple sequences identified by a grouping column generated by `prepareSequences`.
`grouping.column`	character string, name of the column in `sequences` to be used to identify separates sequences within the file.
`time.column`	character string, name of the column with time/depth/rank data.
`exclude.columns`	character string or character vector with column names in `sequences` to be excluded from the analysis.
`parallel.execution`	boolean, if `TRUE` (default), execution is parallelized, and serialized if `FALSE`.
`repetitions`	integer, number of null psi values to obtain.

Value

A list with two slots:

psi: a dataframe. The first two columns contain the names of the sequences being compared, the third column contains the real psi value, and the rest of the column contain psi values computed on permutated versions of the datasets.
p: a dataframe. The first two columns are as above, the third column contains the probability of obtaining a random psi lower than the real psi by chance.

Author(s)

Blas Benito <blasbenito@gmail.com>

Examples



#load data
data("sequencesMIS")

#prepare sequences
MIS.sequences <- prepareSequences(
  sequences = sequencesMIS,
  grouping.column = "MIS",
  transformation = "hellinger"
  )

#execute workflow to compute psi
MIS.null.psi <- workflowNullPsiHP(
 sequences = MIS.sequences[MIS.sequences$MIS %in% c("MIS-1", "MIS-2"), ],
 grouping.column = "MIS",
 repetitions = 3,
 parallel.execution = FALSE
 )

MIS.null.psi

[Package distantia version 1.0.2 Index]