sampsd {SSP} | R Documentation |
Sampling Simulated Data and Estimates of Multivariate Standard Errors
Description
Each set of simulated data is sampled many times for each sampling effort, from 2 replicates to those defined as an argument in the function. Then, distance-based multivariate standard errors are estimated using pseudo-variance (for single site evaluation) or Mean Squares Estimates in a linear model (for multisite evaluation).
Usage
sampsd(dat.sim, Par, transformation, method, n, m, k)
Arguments
dat.sim |
A list of data sets generated by |
Par |
A list of parameters estimated by |
transformation |
Mathematical function to reduce the weight of very dominant species: 'square root', 'fourth root', 'Log (X+1)', 'P/A', 'none' |
method |
The appropriate distance/dissimilarity metric (e.g. Gower, Bray–Curtis, Jaccard, etc). The function |
n |
Maximum number of samples to take at each site. Can be equal or less than N |
m |
Maximum number of sites to sample at each data set. Can be equal or less than sites |
k |
Number of repetitions of each sampling effort (samples and sites) for each data set |
Details
If several virtual sites have been generated, subsets of sites of size 2 to m are sampled, followed by the selection of sampling units (from 2 to n) using inclusion probabilities and self-weighted two-stage sampling (Tille, 2006). Each combination of sampling effort (number of sample units and sites), are repeated several times (e.g. k = 100) for all simulated matrices. If simulated data correspond to a single site, sampling without replacement is performed several times (e.g. k = 100) for each sample size (from 2 to n) within each simulated matrix. This approach is computationally intensive, especially when k is high (> 10). Keep this in mind as it will affect the time to get results. For each sample, suitable pre-treatments are applied and distance/similarity matrices constructed using the appropriate coefficient. When simulations are done for a single site, the MultSE is estimated as \sqrt(V/n)
, being V the pseudo variance measured at each sample of size n (Anderson & Santana-Garcon, 2015). When several sites were generated, MultSE are estimated using the residual mean squares and the sites mean squares from a PERMANOVA model (Anderson & Santana-Garcon, 2015).
Value
mse.results |
A matrix including all estimated MultSE for each simulated data, combination of sample replicates and sites for each k repetition. This matrix will be used by |
Note
For quick exploratory analyzes, keep the number of repetitions small. Once you have explored the behavior of the MultSE, you can repeat the process keeping k-values large (e.g. 100). This process will take some time and it will depend on the power of your computer.
Author(s)
Edlin Guerra-Castro (edlinguerra@gmail.com), Juan Carlos Cajas, Juan Jose Cruz-Motta, Nuno Simoes and Maite Mascaro (mmm@ciencias.unam.mx).
References
Anderson, M.J. & Santana-Garcon, J. (2015) Measures of precision for dissimilarity-based multivariate analysis of ecological communities. Ecology Letters, 18, 66-73
Guerra-Castro, E. J., J. C. Cajas, F. N. Dias Marques Simoes, J. J. Cruz-Motta, and M. Mascaro. (2020). SSP: An R package to estimate sampling effort in studies of ecological communities. bioRxiv:2020.2003.2019.996991.
Tillé, Y. (2006). Sampling algorithms. Springer, New York, NY.
See Also
assempar
, simdata
, summary_ssp
, vegdist
Examples
###To speed up the simulation of these examples, the cases, sites and n were set small.
##Single site: micromollusk from Cayo Nuevo (Yucatan, Mexico)
data(micromollusk)
#Estimation of parameters of pilot data
par.mic<-assempar (data = micromollusk,
type= "P/A",
Sest.method = "average")
#Simulation of 3 data sets, each one with 20 potential sampling units from a single site
sim.mic<-simdata(par.mic, cases = 3, N = 20, sites = 1)
#Sampling and estimation of MultSE for each sample size (few repetitions to speed up the example)
sam.mic<-sampsd(dat.sim = sim.mic,
Par = par.mic,
transformation = "P/A",
method = "jaccard",
n = 10,
m = 1,
k = 3)
##Multiple sites: Sponges from Alacranes National Park (Yucatan, Mexico).
data(sponges)
#Estimation of parameters of pilot data
par.spo<-assempar(data = sponges,
type= "counts",
Sest.method = "average")
#Simulation of 3 data sets, each one with 20 potential sampling units in 3 sites.
sim.spo<-simdata(par.spo, cases = 3, N = 20, sites = 3)
#Sampling and estimation of MultSE for each sampling design (few
#repetitions to speed up the example)
sam.spo<-sampsd(dat.sim = sim.spo,
Par = par.spo,
transformation = "square root",
method = "bray",
n = 10,
m = 3,
k = 3)