R: Calculate Ecological Disparity (Functional Diversity)...

calc_metrics {ecospace}

R Documentation

Calculate Ecological Disparity (Functional Diversity) Dynamics as Function of Sample Size.

Description

Wrapper to FD::dbFD that calculates common ecological disparity and functional diversity statistics. When used with species-wise simulations of community assembly or ecological diversification (and default increm = 'TRUE'), calculates statistical dynamics incrementally as a function of species richness. Avoids file-sharing errors so that can be used in 'embarrasingly parallel' implementations in a high-performance computing environment.

Usage

calc_metrics(
  nreps = 1,
  samples = NA,
  Smax = NA,
  Model = "",
  Param = "",
  m = 3,
  corr = "lingoes",
  method = "Euclidean",
  increm = TRUE,
  ...
)

Arguments

`nreps`	Sample number to calculate statistics for. Default is the first sample `nreps = 1`, but statistics can be calculated for other samples (i.e., second sample if `nreps = 2`), or multiple samples if assigned a vector (sequence) of integers and function is applied within `lapply` or related function.
`samples`	Data frame (if `nreps = 1`) or list of data frames (if `nreps = seq()` or `nreps! = 1`), with each data frame a species-by-trait matrix with species as rows and traits as columns. Traits can be binary, numeric, ordered numeric, factor, or ordered factor types. Each sample is converted to a distance metric (see `method` below) before calculating statistics.
`Smax`	Maximum number of `samples` rows (species) to include in calculations, incremented starting with first row. Default (`NA`) is to increment to the maximum number of `samples` rows (calculated separately for each data frame sample, if a list of data frames). If `Smax` is greater than the size of a sample, then calculation stops after calculating the sample statistics and issues a warning.
`Model`	Optional character string or numeric value naming simulation model. A warning issues if left blank.
`Param`	Optional numeric value or character string naming strength parameter used in simulation. A warning issues if left blank.
`m`	The number of PCoA axes to keep as 'traits' for calculating FRic and FDiv in `FD::dbFD`. Default `m = 3` is justified below, but any integer value greater than 1 is possible. See 'details' for more information.
`corr`	Character string specifying the correction method to use in `FD::dbFD` when the species-by-species distance matrix cannot be represented in a Euclidean space. Default `corr = 'lingoes'` is justified below, but see `FD::dbFD` for other possible values.
`method`	Distance measure to use when calculating functional distances between species. Default is `method = 'Euclidean'` using `stats::dist`. `method = 'Gower'` or any other value uses Gower distance (using `FD::gowdis`). Presence of factor or ordered factor character types forces use of Gower distance, triggering a warning to notify user when changed internally.
`increm`	Default `increm = 'TRUE'` calculates statistics incrementally as a function of species richness. `increm = 'FALSE'` only calculates a single set of statistics for the entire sample.
`...`	Additional parameters for controlling `FD::dbFD`. Common uses include setting `calc.FRic = FALSE` or `calc.FDiv = FALSE` to exclude calculation of FRic and FDiv. Note that the arguments `m`, `corr`, and `method` above have different defaults than used in `FD::dbFD`, and `w.abun = FALSE` and `messages = FALSE` are also internally changed to different defaults. These and others can be controlled here.

Details

The primary goal of this function is to describe the statistical dynamics of common ecological disparity (functional diversity) metrics as a function of species richness (sample size). Statistics are calculated incrementally within samples, first for the first row (species), second for the first and second rows, ..., ending with the entire sample (by default, or terminating with Smax total species). The function assumes that supplied samples are ecologically or evolutionary cohesive assemblages in which there is a logical order to the rows (such that the sixth row is the sixth species added to the assemblage) and that such incremental calculations are sensible. See Novack-Gottshall (2016a,b) for additional context. Samples must have species as rows and traits as columns (of many allowed character types), and have class(data.frame) or a list of such data frames, with each data frame a separate sample.

Statistics calculated include four widely used in ecological disparity studies (adapted from studies of morphological disparity) and four used in functional diversity studies. See Foote (1993), Ciampaglio et al. (2001), and Wills (2001) for definitions and details on morphological disparity measures and Novack-Gottshall (2007; 2016a,b) for implementation as measures of ecological disparity. See Mason et al. (2005), Anderson et al. (2006), Villeger et al. (2008), Laliberte and Legendre (2010), Mouchet et al. (2010), Mouillot et al. (2013) for definitions and details on functional diversity statistics. For computation details of functional diversity metrics, see Laliberte and Shipley (2014) package FD, and especially FD::dbFD, which this function wraps around to calculate the functional diversity statistics.

Statistic (S) is species (taxonomic) richness, or sample size.

When increm = 'FALSE', the function calculates statistics for the entire sample(s) instead of doing so incrementally. In this case, the implementation is essentially the same as FD::dbFD with default arguments (e.g., m, corr) that reduce common calculation errors, plus inclusion of common morphological disparity statistics.

Statistics that measure diversity (unique number of life habits / trait combinations within ecospace / functional-trait space):

H: Life habit richness, the number of functionally unique trait combinations.

Statistics that measure disparity (or dispersion of species within ecospace / functional-trait space) (Note these statistics are sensitive to outliers and sample size):

M: Maximum pairwise distance between species in functional-trait space, measured using the distance method specified above.
FRic: Functional richness, the minimal convex-hull volume in multidimensional principal coordinates analysis (PCoA) trait-space ordination.
FDiv: Functional divergence, the mean distance of species from the PCoA trait-space centroid.

Statistics that measure internal structure (i.e., clumping or inhomogeneities within the trait-space):

D: Mean pairwise distance between species in functional-trait space, measured using the distance method specified above.
V: Total variance, the sum of variances for each functional trait across species; when using factor or ordered factor character types, this statistic cannot be measured and is left blank, with a warning.
FDis: Functional dispersion, the total deviance of species from the circle with radius equal to mean distance from PCoA trait-space centroid.

Statistics that measure spacing among species within the trait-space:

FEve: Functional evenness, the evenness of minimum-spanning-tree lengths between species in PCoA trait-space.

The default number of PCoA axes used in calculating of FRic and FDiv equals m = 3. Because their calculation requires more species than traits (here the m = 3 PCoA axes), the four functional diversity statistics are only calculated when a calculated sample contains a minimum of m species (S) or unique life habits (H). qual.FRic is appended to the output to record the proportion ('quality') of PCoA space retained by this loss of dimensionality. Although including more PCoA axes allows greater statistical power (Villeger et al. 2011, Maire et al. 2015), the use of m = 3 here is computationally manageable, ecologically meaningful, and allows standardized measurement of statistical dynamics across the wide range of sample sizes typically involved in simulations of ecological/evolutionary assemblages, especially when functionally redundant data occur. Other integers greater than 1 can also be specified. See the help file for FD::dbFD for additional information.

Lingoes correction corr = 'lingoes', as recommended by Legendre and Anderson (1999), is called when the species-by-species distance matrix cannot be represented in a Euclidean space. See the help file for FD::dbFD for additional information.

Note that the ecological disparity statistics are calculated on the raw (unstandardized) distance matrix. The functional diversity statistics are calculated on standardized data using standardizations in FD::dbFD. If all traits are numeric, they by default are standardized to mean 0 and unit variance. If not all traits are numeric, Gower's (1971) standardization by the range is automatically used.

Value

Returns a data frame (if nreps is a single integer or samples is a single data frame) or a list of data frames. Each returned data frame has Smax rows corresponding to incremental species richness (sample size) and 12 columns, corresponding to:

`Model`	(optional) `Model` name
`Param`	(optional) strength parameter
`S`	Species richness (sample size)
`H`	Number of functionally unique life habits
`D`	Mean pairwise distance
`M`	Maximum pairwise distance
`V`	Total variance
`FRic`	Functional richness
`FEve`	Functional evenness
`FDiv`	Functional divergence
`FDis`	Functional dispersion
`qual.FRic`	proportion ('quality') of total PCoA trait-space used when calculating FRic and FDiv

Note

A bug exists within FD::gowdis where nearest-neighbor distances can not be calculated when certain characters (especially numeric characters with values other than 0 and 1) share identical traits across species. The nature of the bug is under investigation, but the current implementation is reliable under most uses. If you run into problems because of this bug, a work-around is to manually change the function to call cluster::daisy using metric = "gower" instead.

If calculating statistics for more than several hundred samples, it is recommended to use a parallel-computing environment. The function has been written to allow usage (using lapply or some other list-apply function) in 'embarrassingly parallel' implementations in such HPC environments. Most importantly, overwriting errors during calculation of convex hull volume in FRic are avoided by creating CPU-specific temporarily stored vertices files.

See Novack-Gottshall (2016b) for recommendations for using random forest classification trees to conduct multi-model inference.

Author(s)

Phil Novack-Gottshall pnovack-gottshall@ben.edu

References

Anderson, M. J., K. E. Ellingsen, and B. H. McArdle. 2006. Multivariate dispersion as a measure of beta diversity. Ecology Letters 9(6):683-693.

Ciampaglio, C. N., M. Kemp, and D. W. McShea. 2001. Detecting changes in morphospace occupation patterns in the fossil record: characterization and analysis of measures of disparity. Paleobiology 27(4):695-715.

Foote, M. 1993. Discordance and concordance between morphological and taxonomic diversity. Paleobiology 19:185-204.

Gower, J. C. 1971. A general coefficient of similarity and some of its properties. Biometrics 27:857-871.

Laliberte, E., and P. Legendre. 2010. A distance-based framework for measuring functional diversity from multiple traits. Ecology 91(1):299-305.

Legendre, P., and M. J. Anderson. 1999. Distance-based redundancy analysis: testing multispecies responses in multifactorial ecological experiments. Ecological Monographs 69(1):1-24.

Maire, E., G. Grenouillet, S. Brosse, and S. Villeger. 2015. How many dimensions are needed to accurately assess functional diversity? A pragmatic approach for assessing the quality of functional spaces. Global Ecology and Biogeography 24(6):728-740.

Mason, N. W. H., D. Mouillot, W. G. Lee, and J. B. Wilson. 2005. Functional richness, functional evenness and functional divergence: the primary components of functional diversity. Oikos 111(1):112-118.

Mouchet, M. A., S. Villeger, N. W. H. Mason, and D. Mouillot. 2010. Functional diversity measures: an overview of their redundancy and their ability to discriminate community assembly rules. Functional Ecology 24(4):867-876.

Mouillot, D., N. A. J. Graham, S. Villeger, N. W. H. Mason, and D. R. Bellwood. 2013. A functional approach reveals community responses to disturbances. Trends in Ecology and Evolution 28(3):167-177.

Novack-Gottshall, P.M. 2007. Using a theoretical ecospace to quantify the ecological diversity of Paleozoic and modern marine biotas. Paleobiology 33: 274-295.

Novack-Gottshall, P.M. 2016a. General models of ecological diversification. I. Conceptual synthesis. Paleobiology 42: 185-208.

Novack-Gottshall, P.M. 2016b. General models of ecological diversification. II. Simulations and empirical applications. Paleobiology 42: 209-239.

Villeger, S., N. W. H. Mason, and D. Mouillot. 2008. New multidimensional functional diversity indices for a multifaceted framework in functional ecology. Ecology 89(8):2290-2301.

Villeger, S., P. M. Novack-Gottshall, and D. Mouillot. 2011. The multidimensionality of the niche reveals functional diversity changes in benthic marine biotas across geological time. Ecology Letters 14(6):561-568.

Wills, M. A. 2001. Morphological disparity: a primer. Pp. 55-143. In J. M. Adrain, G. D. Edgecombe, and B. S. Lieberman, eds. Fossils, phylogeny, and form: an analytical approach. Kluwer Academic/Plenum Publishers, New York.

Laliberte, E., and B. Shipley. 2014. FD: Measuring functional diversity from multiple traits, and other tools for functional ecology, Version 1.0-12.

Examples

# Build ecospace framework and a random 50-species sample using neutral rule:
ecospace <- create_ecospace(nchar = 15, char.state = rep(3, 15), char.type = rep("numeric", 15))
sample <- neutral(Sseed = 5, Smax = 50, ecospace = ecospace)
# Using Smax = 10 here for fast example
metrics <- calc_metrics(samples = sample, Smax = 10, Model = "Neutral", Param = "NA")
metrics

# Plot statistical dynamics as function of species richness
op <- par()
par(mfrow = c(2,4), mar = c(4, 4, 1, .3))
attach(metrics)
plot(S, H, type = "l", cex = .5)
plot(S, D, type = "l", cex = .5)
plot(S, M, type = "l", cex = .5)
plot(S, V, type = "l", cex = .5)
plot(S, FRic, type = "l", cex = .5)
plot(S, FEve, type = "l", cex = .5)
plot(S, FDiv, type = "l", cex = .5)
plot(S, FDis, type = "l", cex = .5)

par(op)

# Argument 'increm' switches between incremental and entire-sample calculation
metrics2 <- calc_metrics(samples = sample, Smax = 10, Model = "Neutral",
                         Param = "NA", increm = FALSE)
metrics2
identical(tail(metrics, 1), metrics2) # These are identical

# ... can further control 'FD::dbFD', here turning off calculation of FRic and FDiv
metrics3 <- calc_metrics(samples = sample, Smax = 10, Model = "Neutral",
                         Param = "NA", calc.FRic = FALSE, calc.FDiv = FALSE)
metrics3
rbind(metrics[10, ], metrics3[10, ])

## Not run: 
# Can take a few minutes to run to completion
# Calculate for 5 samples
nreps <- 1:5
samples <- lapply(X = nreps, FUN = neutral, Sseed = 5, Smax = 50, ecospace)
metrics <- lapply(X = nreps, FUN = calc_metrics, samples = samples,
                  Model = "Neutral", Param = "NA")
alarm()
str(metrics)

## End(Not run)

[Package ecospace version 1.4.2 Index]