StandardizeNomenclature {zoolog} | R Documentation |
Standardize Nomenclature
Description
Functions to map the user provided nomenclature into a standard one as defined in a thesaurus.
Usage
StandardizeNomenclature(x, thesaurus, mark.unknown = FALSE)
StandardizeDataSet(data, thesaurusSet = zoologThesaurus)
Arguments
x |
Character vector. |
thesaurus |
A thesaurus object. |
mark.unknown |
Logical. If |
data |
A data frame. |
thesaurusSet |
A thesaurus set. |
Details
StandardizeNomenclature
standardizes a character vector
according to a given thesaurus.
StandardizeDataSet
standardizes column names and values of
a data frame according to a thesaurus set.
Value
StandardizeNomenclature
returns a vector of the same length as the
input vector x
. The names present in the thesaurus are set to their
corresponding category. The names not in the thesaurus are kept unchanged if
mark.unknown=FALSE
(default) and set to NA
if
mark.unknown=TRUE
.
StandardizeDataSet
returns a data frame with the same structure as
the input data
, but standardizing its nomenclature according to a thesaurus set
including appropriate thesauri for its column names and for the values of
a set of columns.
See Also
zoologThesaurus
for a description of the thesaurus and
thesaurus set structure,
ThesaurusReaderWriter
, ThesaurusManagement
Examples
## Select the thesaurus for taxa present in the thesaurus set
## zoolog::zoologThesaurus:
thesaurus <- zoologThesaurus$taxon
thesaurus
## Standardize an heterodox vector of taxa:
StandardizeNomenclature(c("bota", "giraffe", "pig", "cattle"),
thesaurus)
## Observe that "giraffe" is kept unchanged since it is not included in
## any thesaurus category.
## But if mark.unknown is set to TRUE, it is marked as NA:
StandardizeNomenclature(c("bota", "giraffe", "pig", "cattle"),
thesaurus, mark.unknown = TRUE)
## This thesaurus is not case sensitive:
attr(thesaurus, "caseSensitive") # == FALSE
## Thus, names are recognized independently of their case:
StandardizeNomenclature(c("bota", "BOTA", "Bota", "boTa"),
thesaurus)
## Load an example data frame:
dataFile <- system.file("extdata", "dataValenzuelaLamas2008.csv.gz",
package = "zoolog")
dataExample <- utils::read.csv2(dataFile,
na.strings = "",
encoding = "UTF-8")
## Observe mainly the first columns:
head(dataExample[,1:5])
## Stadardize the dataset:
dataStandardized <- StandardizeDataSet(dataExample, zoologThesaurus)
head(dataStandardized[,1:5])