bigdata_tdv {diffval} | R Documentation |
The Total Differential Value of a big phytosociological data set
Description
Given a big phytosociological data set represented as a list, and a partition of the relevés in that list, this function calculates the respective Total Differential Value (TDV).
Usage
bigdata_tdv(
phyto_list,
p,
n_rel,
output_type = "normal",
parallel = FALSE,
mc_cores = getOption("mc.cores", 2L)
)
Arguments
phyto_list |
A list. This is a very light representation of what could
be a usual phytosociological table, registering only taxa presences. Each
component should uniquely represent a taxon and should contain a vector (of
numeric values) with the relevé(s) id(s) where that taxon was observed.
Relevé's ids are expected to be represented by consecutive integers,
starting with 1. The components of the list might be named (e.g. using the
taxon name) or empty (decreasing further memory burden). However, for
|
p |
A vector of integer numbers with the partition of the relevés (i.e.,
a k-partition, consisting in a vector with values from 1 to k, with length
equal to the number of relevés in |
n_rel |
The number of relevés in the |
output_type |
A character determining the amount of information returned by the function and also the amount of pre-validations. Possible values are "normal" (the default) and "fast". |
parallel |
Logical. Should function |
mc_cores |
The number of cores to be passed to |
Details
This function accepts a list (phyto_list
) representing a
phytosociological data set, as well as a k-partition of its relevés (p
),
returning the corresponding TDV (see tdv()
for an explanation
on TDV).
Partition p
gives the group to which each relevé is ascribed, by
increasing order of relevé id.
Big phytosociological tables can occupy a significant amount of computer
memory, which mostly relate to the fact that the absences (usually more
frequent than presences) are also recorded in memory. The use of a list,
focusing only on presences, reduces significantly the amount of needed
memory to store all the information that a phytosociological table contains
and also the computation time of TDV, allowing computations for big data
sets.
Value
If output_type = "normal"
(the default) pre-validations are done
(which can take some time) and a list is returned, with the following
components (see tdv()
for the mathematical notation):
- ifp
A matrix with the
\frac{a}{b}
values for each taxon in each group, for short called the 'inner frequency of presences'.- ofda
A matrix with the
\frac{c}{d}
values for each taxon in each group, for short called the 'outer frequency of differentiating absences'.- e
A vector with the
e
values for each taxon, i.e., the number of groups containing that taxon.- diffval
A matrix with the
DiffVal
for each taxon.- tdv
A numeric with the TDV of matrix
m_bin,
given the partitionp
.
If output_type = "fast"
, only TDV is returned and no pre-validations are
done.
Author(s)
Tiago Monteiro-Henriques. E-mail: tmh.dev@icloud.com.
Examples
# Getting the Taxus baccata forests data set
data(taxus_bin)
# Creating a group partition, as the one presented in the original article of
# the data set
groups <- rep(c(1, 2, 3), c(3, 11, 19))
# Removing taxa occurring in only one relevé, in order to reproduce exactly
# the example in the original article of the data set
taxus_bin_wmt <- taxus_bin[rowSums(taxus_bin) > 1, ]
# Calculating TDV using tdv()
tdv(taxus_bin_wmt, groups)$tdv
# Converting from the phytosociologic matrix format to the list format
taxus_phyto_list <- apply(taxus_bin_wmt, 1, function(x) which(as.logical(x)))
# Getting the number of relevés in the list
n_rel <- length(unique(unlist(taxus_phyto_list)))
# Calculating TDV using bigdata_tdv(), even if this is not a big matrix
bigdata_tdv(
phyto_list = taxus_phyto_list,
p = groups,
n_rel = n_rel,
output_type = "normal"
)$tdv