R: Diversity Metrics of Simulated and Original Data

datquality {SSP}

R Documentation

Diversity Metrics of Simulated and Original Data

Description

The function estimates the average number of species, and the Simpson diversity index per sampling unit, as well as the total multivariate dispersion of pilot data and simulated data

Usage

datquality(data, dat.sim, Par, transformation, method)

Arguments

`data`	Data frame with species names (columns) and samples (rows) information. The first column should indicate the site to which the sample belongs, regardless of whether a single site has been sampled or not
`dat.sim`	List of simulated data generated by simdata
`Par`	List of parameters generated by assempar
`transformation`	Mathematical function to reduce the weight of dominant species: 'square root', 'fourth root', 'Log (X+1)', 'P/A', 'none'
`method`	The appropriate distance/dissimilarity metric. The function `vegdist` is called for that purpose

Details

The quality of the simulated data sets is quantified through the statistical similarity with respect to the pilot data using the following estimators: (i) average number of species per sampling unit, (ii) diversity, defined as the average Simpson diversity index per sampling unit, and (iii) the multivariate dispersion (MVD), measured as the average dissimilarity from all sampling units to the main centroid in the space of the dissimilarity measure used (Anderson 2006). For the simulated data, the overall mean and standard deviation for (i) and (ii) are presented. However, to assess the magnitude of variability in the simulated data, 0.95 quantiles of the MVD for all simulated data sets are also presented.

Value

divmetrics

A data frame that includes the mean and standard deviation of richness and diversity per sampling unit, and the MVD for original and 0.95 quantiles of MVD of simulated data.

Note

It is desirable that the simulated data would be similar to the data observed in terms of species richness and diversity per sampling unit.

Author(s)

Edlin Guerra-Castro (edlinguerra@gmail.com), Juan Carlos Cajas, Juan Jose Cruz-Motta, Nuno Simoes and Maite Mascaro (mmm@ciencias.unam.mx).

References

Anderson, M.J. (2006) Distance-based tests for homogeneity of multivariate dispersions. Biometrics, 62, 245-253

Guerra-Castro, E. J., J. C. Cajas, F. N. Dias Marques Simoes, J. J. Cruz-Motta, and M. Mascaro. (2020). SSP: An R package to estimate sampling effort in studies of ecological communities. bioRxiv:2020.2003.2019.996991.

Examples

###To speed up the simulation of these examples, the cases, sites and n were set small.

##Single site: micromollusk from Cayo Nuevo (Yucatan, Mexico)
data(micromollusk)

#Estimation of parameters of pilot data
par.mic<-assempar (data = micromollusk,
                    type= "P/A",
                    Sest.method = "average")

#Simulation of 3 data sets, each one with 10 potential sampling units from a single site
sim.mic<-simdata(par.mic, cases= 3, N = 10, sites = 1)

#Estimation of diversity metrics of original and simulated data
qua.mic<-datquality(data = micromollusk,
                   dat.sim = sim.mic,
                   Par = par.mic,
                   transformation = "none",
                   method = "jaccard" )
qua.mic

##Multiple sites: Sponges from Alacranes National Park (Yucatan, Mexico).
data(sponges)

#Estimation of parameters of pilot data
par.spo<-assempar(data = sponges,
                  type= "counts",
                  Sest.method = "average")

#Simulation of 3 data sets, each one with 10 potential sampling units in 3 sites.
sim.spo<-simdata(par.spo, cases= 3, N = 10, sites = 3)

#Estimation of diversity metrics of original and simulated data
qua.spo<-datquality(data = sponges,
                    dat.sim = sim.spo,
                    Par = par.spo,
                    transformation = "square root",
                    method = "bray")
qua.spo

[Package SSP version 1.0.1 Index]