bigdata_tdv {diffval}R Documentation

The Total Differential Value of a big phytosociological data set

Description

Given a big phytosociological data set represented as a list, and a partition of the relevés in that list, this function calculates the respective Total Differential Value (TDV).

Usage

bigdata_tdv(
  phyto_list,
  p,
  n_rel,
  output_type = "normal",
  parallel = FALSE,
  mc_cores = getOption("mc.cores", 2L)
)

Arguments

phyto_list

A list. This is a very light representation of what could be a usual phytosociological table, registering only taxa presences. Each component should uniquely represent a taxon and should contain a vector (of numeric values) with the relevé(s) id(s) where that taxon was observed. Relevé's ids are expected to be represented by consecutive integers, starting with 1. The components of the list might be named (e.g. using the taxon name) or empty (decreasing further memory burden). However, for output_type == "normal" taxa names are useful for output interpretation.

p

A vector of integer numbers with the partition of the relevés (i.e., a k-partition, consisting in a vector with values from 1 to k, with length equal to the number of relevés in phyto_list, ascribing each relevé to one of the k groups).

n_rel

The number of relevés in the phyto_list, obtained e.g. with length(unique(unlist(phyto_list))).

output_type

A character determining the amount of information returned by the function and also the amount of pre-validations. Possible values are "normal" (the default) and "fast".

parallel

Logical. Should function parallel::mclapply()) be used to improve computation time by forking? Not available on Windows. Refer to that function manual for more information. Defaults to FALSE.

mc_cores

The number of cores to be passed to parallel::mclapply() if parallel = TRUE. See parallel::mclapply() for more information.

Details

This function accepts a list (phyto_list) representing a phytosociological data set, as well as a k-partition of its relevés (p), returning the corresponding TDV (see tdv() for an explanation on TDV). Partition p gives the group to which each relevé is ascribed, by increasing order of relevé id. Big phytosociological tables can occupy a significant amount of computer memory, which mostly relate to the fact that the absences (usually more frequent than presences) are also recorded in memory. The use of a list, focusing only on presences, reduces significantly the amount of needed memory to store all the information that a phytosociological table contains and also the computation time of TDV, allowing computations for big data sets.

Value

If output_type = "normal" (the default) pre-validations are done (which can take some time) and a list is returned, with the following components (see tdv() for the mathematical notation):

ifp

A matrix with the \frac{a}{b} values for each taxon in each group, for short called the 'inner frequency of presences'.

ofda

A matrix with the \frac{c}{d} values for each taxon in each group, for short called the 'outer frequency of differentiating absences'.

e

A vector with the e values for each taxon, i.e., the number of groups containing that taxon.

diffval

A matrix with the DiffVal for each taxon.

tdv

A numeric with the TDV of matrix ⁠m_bin,⁠ given the partition p.

If output_type = "fast", only TDV is returned and no pre-validations are done.

Author(s)

Tiago Monteiro-Henriques. E-mail: tmh.dev@icloud.com.

Examples

# Getting the Taxus baccata forests data set
data(taxus_bin)

# Creating a group partition, as the one presented in the original article of
# the data set
groups <- rep(c(1, 2, 3), c(3, 11, 19))

# Removing taxa occurring in only one relevé, in order to reproduce exactly
# the example in the original article of the data set
taxus_bin_wmt <- taxus_bin[rowSums(taxus_bin) > 1, ]

# Calculating TDV using tdv()
tdv(taxus_bin_wmt, groups)$tdv

# Converting from the phytosociologic matrix format to the list format
taxus_phyto_list <- apply(taxus_bin_wmt, 1, function(x) which(as.logical(x)))

# Getting the number of relevés in the list
n_rel <- length(unique(unlist(taxus_phyto_list)))

# Calculating TDV using bigdata_tdv(), even if this is not a big matrix
bigdata_tdv(
  phyto_list = taxus_phyto_list,
  p = groups,
  n_rel = n_rel,
  output_type = "normal"
)$tdv


[Package diffval version 1.1.0 Index]