cpr_rand_test {canaper}R Documentation

Run a randomization analysis for one or more biodiversity metrics


The observed value of the biodiversity metric(s) will be calculated for the input community data, then compared against a set of random communities. Various statistics are calculated from the comparison (see Value below).


  n_reps = 100,
  n_iterations = 10000,
  thin = 1,
  metrics = c("pd", "rpd", "pe", "rpe"),
  site_col = "site",
  tbl_out = tibble::is_tibble(comm),
  quiet = FALSE



Dataframe, tibble, or matrix; input community data with sites (communities) as rows and species as columns. Either presence-absence data (values only 0s or 1s) or abundance data (values >= 0) accepted, but calculations do not use abundance-weighting, so results from abundance data will be the same as if converted to presence-absence before analysis.


List of class phylo; input phylogeny.


Character vector of length 1 or object of class commsim; either the name of the model to use for generating random communities (null model), or a custom null model. For full list of available predefined null models, see the help file of vegan::commsim(), or run vegan::make.commsim(). An object of class commsim can be generated with vegan::commsim() (see Examples in cpr_rand_comm()).


Numeric vector of length 1; number of random communities to replicate.


Numeric vector of length 1; number of iterations to use for sequential null models; ignored for non-sequential models.


Numeric vector of length 1; thinning parameter used by some null models in vegan (e.g., quasiswap); ignored for other models.


Character vector; names of biodiversity metrics to calculate. May include one or more of: pd, rpd, pe, rpe (case-sensitive).


Character vector of length 1; name of column in comm that contains the site names; only used if comm is a tibble (object of class tbl_df).


Logical vector of length 1; should the output be returned as a tibble? If FALSE, will return a dataframe. Defaults to TRUE if comm is a tibble.


Logical vector of length 1; if TRUE, suppress all warnings and messages that would be emitted by this function.


The biodiversity metrics (metrics) available for analysis include:

(pe and rpe are needed for CANAPE with cpr_classify_endem())

The choice of a randomization algorithm (null_model) is not trivial, and may strongly affect results. cpr_rand_test() uses null models provided by vegan; for a complete list, see the help file of vegan::commsim() or run vegan::make.commsim(). One frequently used null model is swap (Gotelli & Entsminger 2003), which randomizes the community matrix while preserving column and row sums (marginal sums). For a review of various null models, see Strona et al. (2018); swap is an "FF" model in the sense of Strona et al. (2018).

Instead of using one of the predefined null models in vegan::commsim(), it is also possible to define a custom null model; see Examples in cpr_rand_comm()

Note that the pre-defined models in vegan include binary models (designed for presence-absence data) and quantitative models (designed for abundance data). Although the binary models will accept abundance data, they treat it as binary and always return a binary (presence-absence) matrix. The PD and PE calculations in canaper are not abundance-weighted, so they return the same result regardless of whether the input is presence-absence or abundance. In that sense, binary null models are appropriate for cpr_rand_test(). The quantitative models could also be used for abundance data, but the output will be treated as binary anyways when calculating PD and PE. The effects of using binary vs. quantitative null models for cpr_rand_test() have not been investigated.

A minimum of 5 species and sites are required as input; fewer than that is likely cause the some randomization algorithms (e.g., swap) to enter an infinite loop. Besides, inferences on very small numbers of species and/or sites is not recommended generally.

The following rules apply to comm input:

The results are identical regardless of whether the input for comm is abundance or presence-absence data (i.e., abundance weighting is not used).


Dataframe. For each of the biodiversity metrics, the following 9 columns will be produced:

So if you included pd in metrics, the output columns would include pd_obs, pd_obs_c_lower, etc...


Faith DP (1992) Conservation evaluation and phylogenetic diversity. Biological Conservation, 61:1–10. doi:10.1016/0006-3207(92)91201-3

Gotelli, N.J. and Entsminger, N.J. (2003). Swap algorithms in null model analysis. Ecology 84, 532–535.

Mishler, B., Knerr, N., González-Orozco, C. et al. (2014) Phylogenetic measures of biodiversity and neo- and paleo-endemism in Australian Acacia. Nat Commun, 5: 4473. doi:10.1038/ncomms5473

Rosauer, D., Laffan, S.W., Crisp, M.D., Donnellan, S.C. and Cook, L.G. (2009) Phylogenetic endemism: a new approach for identifying geographical concentrations of evolutionary history. Molecular Ecology, 18: 4061-4072. doi:10.1111/j.1365-294X.2009.04311.x

Strona, G., Ulrich, W. and Gotelli, N.J. (2018), Bi-dimensional null model analysis of presence-absence binary matrices. Ecology, 99: 103-115. doi:10.1002/ecy.2043


# Returns a dataframe by defualt
  phylocom$comm, phylocom$phy,
  null_model = "curveball", metrics = "pd", n_reps = 10

# Tibbles may be preferable because of the large number of columns
  phylocom$comm, phylocom$phy,
  null_model = "curveball", tbl_out = TRUE, n_reps = 10

[Package canaper version 1.0.0 Index]