harmoniseR {BeeBDC} | R Documentation |
Harmonise taxonomy of bee occurrence data
Description
Uses the Discover Life taxonomy to harmonise bee occurrences and flag those that do not match
the checklist. harmoniseR()
prefers to use the names_clean columns that is generated
by bdc::bdc_clean_names()
. While this is not required, you may find better results by running
that function on your dataset first.
This function could be hijacked to service other taxa if a user matched the format of the
beesTaxonomy()
file.
Usage
harmoniseR(
data = NULL,
path = NULL,
taxonomy = BeeBDC::beesTaxonomy(),
speciesColumn = "scientificName",
rm_names_clean = TRUE,
checkVerbatim = FALSE,
stepSize = 1e+06,
mc.cores = 1
)
Arguments
data |
A data frame or tibble. Occurrence records as input. |
path |
A directory as character. The path to a folder that the output can be saved. |
taxonomy |
A data frame or tibble. The bee taxonomy to use.
Default = |
speciesColumn |
Character. The name of the column containing species names. Default = "scientificName". |
rm_names_clean |
Logical. If TRUE then the names_clean column will be removed at the end of this function to help reduce confusion about this column later. Default = TRUE |
checkVerbatim |
Logical. If TRUE then the verbatimScientificName will be checked as well
for species matches. This matching will ONLY be done after harmoniseR has failed for the other
name columns. NOTE: this column is not first run through |
stepSize |
Numeric. The number of occurrences to process in each chunk. Default = 1000000. |
mc.cores |
Numeric. If > 1, the function will run in parallel using mclapply using the number of cores specified. If = 1 then it will be run using a serial loop. NOTE: Windows machines must use a value of 1 (see ?parallel::mclapply). Additionally, be aware that each thread can use large chunks of memory. Default = 1. |
Value
The occurrences are returned with update taxonomy columns, including: scientificName, species, family, subfamily, genus, subgenus, specificEpithet, infraspecificEpithet, and scientificNameAuthorship. A new column, .invalidName, is also added and is FALSE when the occurrence's name did not match the supplied taxonomy.
See Also
taxadbToBeeBDC()
to download any taxonomy (of any taxa or of bees) and
beesTaxonomy()
for the bee taxonomy download.
Examples
# load in the test dataset
system.file("extdata", "testTaxonomy.rda", package="BeeBDC") |> load()
beesRaw_out <- BeeBDC::harmoniseR(
#The path to a folder that the output can be saved
path = tempdir(),
# The formatted taxonomy file
taxonomy = testTaxonomy,
data = BeeBDC::beesFlagged,
speciesColumn = "scientificName")
table(beesRaw_out$.invalidName, useNA = "always")