R: Calculate sample based and group based biodiversity...

get_mob_stats {mobr}

R Documentation

Calculate sample based and group based biodiversity statistics.

Description

Calculate sample based and group based biodiversity statistics.

Usage

get_mob_stats(
  mob_in,
  group_var,
  ref_level = NULL,
  index = c("N", "S", "S_n", "S_PIE"),
  effort_samples = NULL,
  effort_min = 5,
  extrapolate = TRUE,
  return_NA = FALSE,
  rare_thres = 0.05,
  n_perm = 199,
  boot_groups = FALSE,
  conf_level = 0.95,
  cl = NULL,
  ...
)

Arguments

`mob_in`	an object of class mob_in created by make_mob_in()
`group_var`	String that specifies which field in `mob_in$env` the data should be grouped by
`ref_level`	String that defines the reference level of `group_var` to which all other groups are compared with, defaults to `NULL`. If `NULL` then the default contrasts of `group_var` are used.
`index`	The calculated biodiversity indices. The options are `N` ... Number of individuals (total abundance) `S` ... Number of species `S_n` ... Rarefied or extrapolated number of species for n individuals `S_asymp` ... Estimated asymptotic species richness `f_0` ... Estimated number of undetected species `pct_rare` ... The percent of rare species as defined by `rare_thres` `PIE` ... Hurlbert's PIE (Probability of Interspecific Encounter) `S_PIE` ... Effective number of species based on PIE If index is not specified then N, S, S_n, pct_rare, and S_PIE are computed by default. See Details for additional information on the biodiversity statistics.
`effort_samples`	The standardized number of individuals used for the calculation of rarefied species richness at the alpha-scale. This can a be single value or an integer vector. As default the minimum number of individuals found across the samples is used, when this is not smaller than `effort_min`.
`effort_min`	The minimum number of individuals considered for the calculation of rarefied richness (Default value of 5). Samples with less individuals then `effort_min` are excluded from the analysis with a warning. Accordingly, when `effort_samples` is set by the user it has to be higher than `effort_min`.
`extrapolate`	Boolean which specifies if richness should be extrapolated when `effort_samples` is larger than the number of individuals using the chao1 method. Defaults to TRUE.
`return_NA`	Boolean defaults to FALSE in which the rarefaction function returns the observed S when `effort` is larger than the number of individuals. If set to TRUE then NA is returned. Note that this argument is only relevant when `extrapolate = FALSE`.
`rare_thres`	The threshold that determines how pct_rare is computed. It can range from (0, 1] and defaults to 0.05 which specifies that any species with less than or equal to 5 considered rare. It can also be specified as "N/S" which results in using average abundance as the threshold which McGill (2011) found to have the best small sample behavior.
`n_perm`	The number of permutations to use for testing for treatment effects. Defaults to 199.
`boot_groups`	Use bootstrap resampling within groups to derive gamma-scale confidence intervals for all biodiversity indices. Default is `FALSE`. See Details for information on the bootstrap approach.
`conf_level`	Confidence level used for the calculation of gamma-scale bootstrapped confidence intervals. Only used when `boot_groups = TRUE`.
`cl`	A cluster object created by `makeCluster`, or an integer to indicate number of child-processes (integer values are ignored on Windows) for parallel evaluations (see Details on performance).
`...`	Optional arguments to `FUN`.

Details

BIODIVERSITY INDICES

S_n: Rarefied species richness is the expected number of species, given a defined number of sampled individuals (n) (Gotelli & Colwell 2001). Rarefied richness at the alpha-scale is calculated for the values provided in effort_samples as long as these values are not smaller than the user-defined minimum value effort_min. In this case the minimum value is used and samples with less individuals are discarded. When no values for effort_samples are provided the observed minimum number of individuals of the samples is used, which is the standard in rarefaction analysis (Gotelli & Colwell 2001). Because the number of individuals is expected to scale linearly with sample area or effort, at the gamma-scale the number of individuals for rarefaction is calculated as the minimum number of samples within groups multiplied by effort_samples. For example, when there are 10 samples within each group, effort_groups equals 10 * effort_samples. If n is larger than the number of individuals in sample and extrapolate = TRUE then the Chao1 (Chao 1984, Chao 1987) method is used to extrapolate the rarefaction curve.

pct_rare: Percent of rare species Is the ratio of the number of rare species to the number of observed species x 100 (McGill 2011). Species are considered rare in a particular sample if they have fewer individuals than rare_thres * N where rare_thres can be set by the user and N is the total number of individuals in the sample. The default value of rare_thres of 0.05 is arbitrary and was chosen because McGill (2011) found this metric of rarity performed well and was generally less correlated with other common metrics of biodiversity. Essentially this metric attempt to estimate what proportion of the species in the same occur in the tail of the species abundance distribution and is therefore sensitive to presence of rare species.

S_asymp: Asymptotic species richness is the expected number of species given complete sampling and here it is calculated using the Chao1 estimator (Chao 1984, Chao 1987) see calc_chao1. Note: this metric is typically highly correlated with S (McGill 2011).

f_0: Undetected species richness is the number of undetected species or the number of species observed 0 times which is an indicator of the degree of rarity in the community. If there is a greater rarity then f_0 is expected to increase. This metric is calculated as S_asymp - S. This metric is less correlated with S than the raw S_asymp metric.

PIE: Probability of intraspecific encounter represents the probability that two randomly drawn individuals belong to the same species. Here we use the definition of Hurlbert (1971), which considers sampling without replacement. PIE is closely related to the well-known Simpson diversity index, but the latter assumes sampling with replacement.

S_PIE: Effective number of species for PIE represents the effective number of species derived from the PIE. It is calculated using the asymptotic estimator for Hill numbers of diversity order 2 (Chao et al, 2014). S_PIE represents the species richness of a hypothetical community with equally-abundant species and infinitely many individuals corresponding to the same value of PIE as the real community. An intuitive interpretation of S_PIE is that it corresponds to the number of dominant (highly abundant) species in the species pool.

For species richness S, rarefied richness S_n, undetected richness f_0, and the Effective Number of Species S_PIE we also calculate beta-diversity using multiplicative partitioning (Whittaker 1972, Jost 2007). That means for these indices we estimate beta-diversity as the ratio of gamma-diversity (total diversity across all plots) divided by alpha-diversity (i.e., average plot diversity).

PERMUTATION TESTS AND BOOTSTRAP

For both the alpha and gamma scale analyses we summarize effect size in each biodiversity index by computing D_bar: the average absolute difference between the groups. At the alpha scale the indices are averaged first before computing D_bar.

We used permutation tests for testing differences of the biodiversity statistics among the groups (Legendre & Legendre 1998). At the alpha-scale, one-way ANOVA (i.e. F-test) is implemented by shuffling treatment group labels across samples. The test statistic for this test is the F-statistic which is a pivotal statistic (Legendre & Legendre 1998). At the gamma-scale we carried out the permutation test by shuffling the treatment group labels and using D_bar as the test statistic. We could not use the F-statistic as the test statistic at the gamma scale because at this scale there are no replicates and therefore the F-statistic is undefined.

A bootstrap approach can be used to also test differences at the gamma-scale. When boot_groups = TRUE instead of the gamma-scale permutation test, there will be resampling of samples within groups to derive gamma-scale confidence intervals for all biodiversity indices. The function output includes lower and upper confidence bounds and the median of the bootstrap samples. Please note that for the richness indices sampling with replacement corresponds to rarefaction to ca. 2/3 of the individuals, because the same samples occur several times in the resampled data sets.

Value

A list of class mob_stats that contains alpha-scale and gamma-scale biodiversity statistics, as well as the p-values for permutation tests at both scales.

When boot_groups = TRUE there are no p-values at the gamma-scale. Instead there is lower bound, median, and upper bound for each biodiversity index derived from the bootstrap within groups.

Author(s)

Felix May and Dan McGlinn

References

Chiu, C.-H., Wang, Y.-T., Walther, B.A. & Chao, A. (2014) An improved nonparametric lower bound of species richness via a modified good-turing frequency formula. Biometrics, 70, 671-682.

Gotelli, N.J. & Colwell, R.K. (2001) Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness. Ecology letters, 4, 379-391.

Hurlbert, S.H. (1971) The Nonconcept of Species Diversity: A Critique and Alternative Parameters. Ecology, 52, 577-586.

Jost, L. (2006) Entropy and diversity. Oikos, 113, 363-375.

Jost, L. (2007) Partitioning Diversity into Independent Alpha and Beta Components. Ecology, 88, 2427-2439.

Legendre, P. & Legendre, L.F.J. (1998) Numerical Ecology, Volume 24, 2nd Edition Elsevier, Amsterdam; Boston.

McGill, B.J. (2011) Species abundance distributions. 105-122 in Biological Diversity: Frontiers in Measurement and Assessment. eds. A.E. Magurran B.J. McGill.

Whittaker, R.H. (1972) Evolution and Measurement of Species Diversity. Taxon, 21, 213-251.

Examples

# a binary grouping variable (uninvaded or invaded)
data(inv_comm)
data(inv_plot_attr)
inv_mob_in = make_mob_in(inv_comm, inv_plot_attr, c('x', 'y'))
inv_stats = get_mob_stats(inv_mob_in, group_var = "group", ref_level = 'uninvaded',
                          n_perm = 19, effort_samples = c(5,10))
plot(inv_stats)


# parallel evaluation using the parallel package 
# run in parallel
library(parallel)
cl = makeCluster(2L)
clusterEvalQ(cl, library(mobr))
clusterExport(cl, 'inv_mob_in')
inv_mob_stats = get_mob_stats(inv_mob_in, 'group', ref_level = 'uninvaded',
                              n_perm=999, cl=cl)

stopCluster(cl)

[Package mobr version 2.0.2 Index]