R: Synchronize Community and Environmental Datasets

removeNAcomm {BiodiversityR}

R Documentation

Synchronize Community and Environmental Datasets

Description

These functions may assist to ensure that the sites of the community dataset are the same sites as those from the environmental dataset, something that is assumed to be the case for the BiodiversityR and vegan packages.

Usage

same.sites(x, y)
check.datasets(x, y)
check.ordiscores(x, ord, check.species = TRUE)
removeNAcomm(x, y, variable)
removeNAenv(x, variable)
removezerospecies(x)
subsetcomm(x, y, factor, level, returncomm = TRUE)

import.with.readxl(file = file.choose(), data.type = "community", sheet = NULL, 
    sitenames = "sites", column = "species", value = "abundance", 
    factor = "", level = "", cepnames = FALSE,
    write.csv = FALSE, csv.file = paste(data.type, ".csv", sep=""))

Arguments

`x`	Data frame assumed to be the community dataset with variables corresponding to species.
`y`	Data frame assumed to be the environmental dataset with variables corresponding to descriptors of sites.
`ord`	Ordination result.
`check.species`	Should the species scores be checked (TRUE) or not.
`variable`	Name of the variable from the environmental dataset with NA values that indicate those sites that should be removed.
`factor`	Variable of the environmental data frame that defines subsets for the data frame.
`level`	Level of the variable to create the subsets for the data frame.
`returncomm`	For the selected sites, return the community dataset (TRUE) or the environmental dataset.
`file`	Location of the Excel (or Access) file.
`data.type`	Type of the data set to be imported: one of "community", "environmental" or "stacked".
`sheet`	Name of the sheet of the Excel file to import from (if missing, then `data.type` is used)
`sitenames`	Name of categorical variable that provides the names for the sites.
`column`	Name of the categorical variable for the columns of the crosstabulation (typically indicating species); passed to `makecommunitydataset`.
`value`	Name of numerical variable for the cells of the crosstabulation (typically indicating abundance). The cells provide the sum of all values in the data frame; passed to `makecommunitydataset`.
`cepnames`	Should the names of columns be abbreviated via `make.cepnames` (TRUE) or not (FALSE).
`write.csv`	Create a comma-delimited text file in the working directory (if `TRUE`).
`csv.file`	Name of the comma-delimited text file to be created.

Details

Function same.sites provides a new data frame that has the same row names as the row names of the environmental data set and the same (species) variables as the original community data set. Sites from the original community data set that have no corresponding sites in the environmental data set are not included in the new community data set. (Hint: this function can be especially useful when some sites do not contain any species and where a community dataset was generated by the makecommunitydataset function.)

Function check.datasets checks whether the community and environmental data sets have the same number of rows, and (if this was the case) whether the rownames of both data sets are the same. The function also returns the dimensions of both data sets.

Function check.ordiscores checks whether the community data set and the ordination result have the same number of rows (sites) and columns (species, optional for check.species==TRUE), and (if this was the case) whether the row and column names of both data sets are the same. Site and species scores for the ordination result are obtained via function scores (vegan).

Functions removeNAcomm and removeNAenv provide a new data frame that does not contain NA for the specified variable. The specifed variable is part of the environmental data set. These functions are particularly useful when using community and environmental datasets, as new community and environmental datasets can be calculated that contain information from the same sample plots (sites). An additional result of removeNAenv is that factor levels of any categorical variable that do not occur any longer in the new data set are removed from the levels of the categorical variable.

Function replaceNAcomm substitutes cells containing NA with 0 in the community data set.

Function removezerospecies removes species from a community dataset that have total abundance that is smaller or equal to zero.

Function subsetcomm makes a subset of sites that contain a specified level of a categorical variable from the environmental data set. The same functionality of selecting subsets of the community or environmental data sets are implemented in various functions of BiodiversityR (for example diversityresult, renyiresult and accumresult) and have the advantage that it is not necessary to create a new data set. If a community dataset is returned, species that did not contain any individuals were removed from the data set. If an environmental dataset is returned, factor levels that did not occur were removed from the data set.

Function import.with.readxl provides methods of importing community or environmental datasets through read_excel.

For stacked datasets, a community data set is created with function makecommunitydataset. For community data with more species than the limited number of columns in Excel, this may be the only option of importing a community dataset.

An additional advantage of the function is that the community and environmental data can be stored in the same file.

You may want to check compatibility of the community and environmental datasets with functions check.datasets and modify the community dataset through same.sites.

Value

The functions return a data frame or results of tests on the correspondence between community and environmental data sets.

Author(s)

Roeland Kindt (World Agroforestry Centre)

References

Kindt, R. & Coe, R. (2005) Tree diversity analysis: A manual and software for common statistical methods for ecological and biodiversity studies.

https://www.worldagroforestry.org/output/tree-diversity-analysis

Examples

library(vegan)
data(dune.env)
data(dune)
dune.env2 <- dune.env
dune.env2[1:4,"Moisture"] <- NA
dune2 <- removeNAcomm(dune,dune.env2,"Moisture")
dune.env2 <- removeNAenv(dune.env2,"Moisture")
dune3 <- same.sites(dune,dune.env2)
check.datasets(dune,dune.env2)
check.datasets(dune2,dune.env2)
check.datasets(dune3,dune.env2)
dune4 <- subsetcomm(dune,dune.env,"Management","NM",returncomm=TRUE)
dune.env4 <- subsetcomm(dune,dune.env,"Management","NM",returncomm=FALSE)
dune5 <- same.sites(dune,dune.env4)
check.datasets(dune4,dune5)

[Package BiodiversityR version 2.16-1 Index]