create_taxonomic_update_lookup {APCalign} | R Documentation |
Create a table with the best-possible scientific name match for Australian plant names
Description
This function takes a list of Australian plant names that need to be reconciled with current taxonomy and generates a lookup table of the best-possible scientific name match for each input name.
Usage case: This is APCalign’s core function, merging together the alignment and updating of taxonomy.
Usage
create_taxonomic_update_lookup(
taxa,
stable_or_current_data = "stable",
version = default_version(),
taxonomic_splits = "most_likely_species",
full = FALSE,
fuzzy_abs_dist = 3,
fuzzy_rel_dist = 0.2,
fuzzy_matches = TRUE,
APNI_matches = TRUE,
imprecise_fuzzy_matches = FALSE,
identifier = NA_character_,
resources = load_taxonomic_resources(),
quiet = FALSE,
output = NULL
)
Arguments
taxa |
A list of Australian plant species that needs to be reconciled with current taxonomy. |
stable_or_current_data |
either "stable" for a consistent version, or "current" for the leading edge version. |
version |
The version number of the dataset to use. |
taxonomic_splits |
How to handle one_to_many taxonomic matches. Default is "return_all". The other options are "collapse_to_higher_taxon" and "most_likely_species". most_likely_species defaults to the original_name if that name is accepted by the APC; this will be right for certain species subsets, but make errors in other cases, use with caution. |
full |
logical for whether the full lookup table is returned or just key columns |
fuzzy_abs_dist |
The number of characters allowed to be different for a fuzzy match. |
fuzzy_rel_dist |
The proportion of characters allowed to be different for a fuzzy match. |
fuzzy_matches |
Fuzzy matches are turned on as a default. The relative
and absolute distances allowed for fuzzy matches to species and
infraspecific taxon names are defined by the parameters |
APNI_matches |
Name matches to the APNI (Australian Plant Names Index) are turned off as a default. |
imprecise_fuzzy_matches |
Imprecise fuzzy matches uses the fuzzy matching function with lenient levels set (absolute distance of 5 characters; relative distance = 0.25). It offers a way to get a wider range of possible names, possibly corresponding to very distant spelling mistakes. This is FALSE as default and all outputs should be checked as it often makes erroneous matches. |
identifier |
A dataset, location or other identifier, which defaults to NA. |
resources |
These are the taxonomic resources used for cleaning, this
will default to loading them from a local place on your computer. If this is
to be called repeatedly, it's much faster to load the resources using
|
quiet |
Logical to indicate whether to display messages while aligning taxa. |
output |
file path to save the output. If this file already exists, this function will check if it's a subset of the species passed in and try to add to this file. This can be useful for large and growing projects. |
Details
It uses first the function
align_taxa
, then the functionupdate_taxonomy
to achieve the output. The aligned name is plant name that has been aligned to a taxon name in the APC or APNI by the align_taxa function.
Notes:
If you will be running the function APCalign::create_taxonomic_update_lookup many times, it is best to load the taxonomic resources separately using
resources <- load_taxonomic_resources()
, then add the argument resources = resourcesThe name Banksia cerrata does not align as the fuzzy matching algorithm does not allow the first letter of the genus and species epithet to change.
The argument taxonomic_splits allows you to choose the outcome for updating the names of taxa with ambiguous taxonomic histories; this applies to scientific names that were once attached to a more broadly circumscribed taxon concept, that was then split into several more narrowly circumscribed taxon concepts, one of which retains the original name. There are three options: most_likely_species returns the name that is retained, with alternative names documented in square brackets; return_all adds additional rows to the output, one for each possible taxon concept; collapse_to_higher_taxon returns the genus with possible names in square brackets.
The argument identifier allows you to add a fix text string to all genus- and family- level names, such as identifier = "Royal NP" would return
Acacia sp. \[Royal NP]
.
Value
A lookup table containing the accepted and suggested names for each original name input, and additional taxonomic information such as taxon rank, taxonomic status, taxon IDs and genera.
original_name: the original plant name.
aligned_name: the input plant name that has been aligned to a taxon name in the APC or APNI by the align_taxa function.
accepted_name: the APC-accepted plant name, when available.
suggested_name: the suggested plant name to use. Identical to the accepted_name, when an accepted_name exists; otherwise the the suggested_name is the aligned_name.
genus: the genus of the accepted (or suggested) name; only APC-accepted genus names are filled in.
family: the family of the accepted (or suggested) name; only APC-accepted family names are filled in.
taxon_rank: the taxonomic rank of the suggested (and accepted) name.
taxonomic_dataset: the source of the suggested (and accepted) names (APC or APNI).
taxonomic_status: the taxonomic status of the suggested (and accepted) name.
taxonomic_status_aligned: the taxonomic status of the aligned name, before any taxonomic updates have been applied.
aligned_reason: the explanation of a specific taxon name alignment (from an original name to an aligned name).
update_reason: the explanation of a specific taxon name update (from an aligned name to an accepted or suggested name).
subclass: the subclass of the accepted name.
taxon_distribution: the distribution of the accepted name; only filled in if an APC accepted_name is available.
scientific_name_authorship: the authorship information for the accepted (or synonymous) name; available for both APC and APNI names.
taxon_ID: the unique taxon concept identifier for the accepted_name; only filled in if an APC accepted_name is available.
taxon_ID_genus: an identifier for the genus; only filled in if an APC-accepted genus name is available.
scientific_name_ID: an identifier for the nomenclatural (not taxonomic) details of a scientific name; available for both APC and APNI names.
row_number: the row number of a specific original_name in the input.
number_of_collapsed_taxa: when taxonomic_splits == "collapse_to_higher_taxon", the number of possible taxon names that have been collapsed.
See Also
Other taxonomic alignment functions:
align_taxa()
,
update_taxonomy()
Examples
resources <- load_taxonomic_resources()
# example 1
create_taxonomic_update_lookup(c("Eucalyptus regnans",
"Acacia melanoxylon",
"Banksia integrifolia",
"Not a species"),
resources = resources)
# example 2
input <- c("Banksia serrata", "Banksia serrate", "Banksia cerrata",
"Banksea serrata", "Banksia serrrrata", "Dryandra")
create_taxonomic_update_lookup(
taxa = input,
identifier = "APCalign test",
full = TRUE,
resources = resources
)
# example 3
taxon_list <-
readr::read_csv(
system.file("extdata", "test_taxa.csv", package = "APCalign"),
show_col_types = FALSE)
create_taxonomic_update_lookup(
taxa = taxon_list$original_name,
identifier = taxon_list$notes,
full = TRUE,
resources = resources
)