workflowPsiHP {distantia}R Documentation

A refactored version of workflowPsi with a higher performance (hence the suffix HP).

Description

Ideal for large analyses with hundreds to thousands of sequences. Several options available in workflowPsi have been removed from this function in order to simplify the code as much as possible. Psi is computed with the options diagonal = TRUE, ignore.blocks = TRUE, and method = "euclidean".

Usage

workflowPsiHP(
  sequences = NULL,
  grouping.column = NULL,
  time.column = NULL,
  exclude.columns = NULL,
  parallel.execution = TRUE
  )

Arguments

sequences

dataframe with multiple sequences identified by a grouping column generated by prepareSequences.

grouping.column

character string, name of the column in sequences to be used to identify separates sequences within the file.

time.column

character string, name of the column with time/depth/rank data.

exclude.columns

character string or character vector with column names in sequences to be excluded from the analysis.

parallel.execution

boolean, if TRUE (default), execution is parallelized, and serialized if FALSE.

Details

Due to limitations of the function permutations, the maximum number of groups (according to grouping.column) is around 30000. Besides, a combinations table of this size takes, roughlyl, 7GB of memory.

Value

A dataframe with sequence names and psi values.

Author(s)

Blas Benito <blasbenito@gmail.com>

Examples



data("sequencesMIS")
#prepare sequences
MIS.sequences <- prepareSequences(
  sequences = sequencesMIS[sequencesMIS$MIS %in% c("MIS-1", "MIS-2"), ],
  grouping.column = "MIS",
  if.empty.cases = "zero",
  transformation = "hellinger"
  )

#execute workflow to compute psi
MIS.psi <- workflowPsiHP(
 sequences = MIS.sequences,
 grouping.column = "MIS",
 parallel.execution = FALSE
 )

MIS.psi




[Package distantia version 1.0.2 Index]