WFO.match.fuzzyjoin {WorldFlora} | R Documentation |
Standardize plant names according to World Flora Online taxonomic backbone
Description
An alternative and typically faster method of matching records than WFO.match that allows for different methods of calculating the fuzzy distance via stringdist.
Usage
WFO.match.fuzzyjoin(spec.data = NULL, WFO.file = NULL, WFO.data = NULL,
no.dates = TRUE,
spec.name = "spec.name",
Authorship = "Authorship",
stringdist.method = "lv", fuzzydist.max = 4,
Fuzzy.min = TRUE,
acceptedNameUsageID.match = TRUE,
squish = TRUE,
spec.name.tolower = FALSE, spec.name.nonumber = TRUE, spec.name.nobrackets = TRUE,
spec.name.sub = TRUE,
sub.pattern=c(" sp[.] A", " sp[.] B", " sp[.] C", " sp[.]", " spp[.]", " pl[.]",
" indet[.]", " ind[.]", " gen[.]", " g[.]", " fam[.]", " nov[.]", " prox[.]",
" cf[.]", " aff[.]", " s[.]s[.]", " s[.]l[.]",
" p[.]p[.]", " p[.] p[.]", "[?]", " inc[.]", " stet[.]", "Ca[.]",
"nom[.] cons[.]", "nom[.] dub[.]", " nom[.] err[.]", " nom[.] illeg[.]",
" nom[.] inval[.]", " nom[.] nov[.]", " nom[.] nud[.]", " nom[.] obl[.]",
" nom[.] prot[.]", " nom[.] rej[.]", " nom[.] supp[.]", " sensu auct[.]"))
Arguments
spec.data |
A data.frame containing variables with species names. In case that a character vector is provided, then this vector will be converted to a data.frame |
WFO.file |
File name of the static copy of the Taxonomic Backbone. If not |
WFO.data |
Data set with the static copy of the Taxonomic Backbone. Ignored if |
no.dates |
Speeding up the loading of the WFO.data by not loading fields of 'created' and 'modified'. |
spec.name |
Name of the column with taxonomic names. |
Authorship |
Name of the column with the naming authorities. |
stringdist.method |
Method used to calculate the fuzzy distance as used by in the internally called |
fuzzydist.max |
Maximum distance used for joining as in |
Fuzzy.min |
Limit the results of fuzzy matching to those with the smallest distance. |
acceptedNameUsageID.match |
If |
squish |
If |
spec.name.tolower |
If |
spec.name.nonumber |
If |
spec.name.nobrackets |
If |
spec.name.sub |
If |
sub.pattern |
Sections of the |
Details
This function matches plant names by using the stringdist_left_join
function internally. The results are provided in a similar formatto those from WFO.match
; therefore the WFO.one
function can be used in a next step of the analysis.
For large data sets the function may fail due to memory limits. A solution is to analyse different subsets of large data, as for example shown by Kindt (2023).
Column 'Unique' shows whether there was a unique match (or not match) in the WFO.
Column 'Matched' shows whether there was a match in the WFO.
Column 'Fuzzy' shows whether matching was done by the fuzzy method.
Column 'Fuzzy.dist' gives the fuzzy distance calculated between submitted and matched plant names, calculated internally with stringdist_left_join.
Column 'Auth.dist' gives the Levenshtein distance calculated between submitted and matched authorship names, if the former were provided. This distance is calculated in the same way as for the WFO.match function via adist.
Column 'Subseq' gives different numbers for different matches for the same plant name.
Column 'Hybrid' shows whether there was a hybrid character in the scientificName.
Column 'New.accepted' shows whether the species details correspond to the current accepted name.
Column 'Old.status' gives the taxonomic status of the first match with the non-blank acceptedNameUsageID.
Column 'Old.ID' gives the ID of the first match with the non-blank acceptedNameUsageID.
Column 'Old.name' gives the name of the first match with the non-blank acceptedNameUsageID.
Value
The main function returns a data.set with the matched species details from the WFO.
Author(s)
Roeland Kindt (World Agroforestry, CIFOR-ICRAF)
References
World Flora Online. An Online Flora of All Known Plants. https://www.worldfloraonline.org
Sigovini M, Keppel E, Tagliapietra. 2016. Open Nomenclature in the biodiversity era. Methods in Ecology and Evolution 7: 1217-1225.
Kindt, R. 2020. WorldFlora: An R package for exact and fuzzy matching of plant names against the World Flora Online taxonomic backbone data. Applications in Plant Sciences 8(9): e11388
Kindt, R. 2023. Standardizing tree species names of GlobalTreeSearch with WorldFlora while testing the faster matching function of WFO.match.fuzzyjoin. https://rpubs.com/Roeland-KINDT/996500
See Also
Examples
## Not run:
data(WFO.example)
library(fuzzyjoin)
spec.test <- data.frame(spec.name=c("Faidherbia albida", "Acacia albida",
"Faidherbia albiad",
"Omalanthus populneus", "Pygeum afric"))
WFO.match.fuzzyjoin(spec.data=spec.test, WFO.data=WFO.example)
# Using the Damerau-Levenshtein distance
WFO.match.fuzzyjoin(spec.data=spec.test, WFO.data=WFO.example,
stringdist.method="dl")
## End(Not run)