WFO.match {WorldFlora} | R Documentation |
Standardize plant names according to World Flora Online taxonomic backbone
Description
This package checks a list of taxa (typically species) against the World Flora Online (WFO) taxonomic backbone. The user needs to first download a static copy of the Taxonomic Backbone data from https://www.worldfloraonline.org or https://zenodo.org/doi/10.5281/zenodo.7460141 (_DwC_backbone_R.zip).
Usage
WFO.match(spec.data = NULL, WFO.file = NULL, WFO.data = NULL,
no.dates = TRUE,
spec.name = "spec.name", Genus = "Genus", Species = "Species",
Infraspecific.rank = "Infraspecific.rank", Infraspecific = "Infraspecific",
Authorship = "Authorship", First.dist = FALSE,
acceptedNameUsageID.match = TRUE,
Fuzzy = 0.1, Fuzzy.force = FALSE, Fuzzy.max = 250, Fuzzy.min = TRUE,
Fuzzy.shortest = FALSE, Fuzzy.within = FALSE,
Fuzzy.two = TRUE, Fuzzy.one = TRUE,
squish = TRUE,
spec.name.tolower = FALSE, spec.name.nonumber = TRUE, spec.name.nobrackets = TRUE,
exclude.infraspecific = FALSE,
infraspecific.excluded = c("cultivar.", "f.", "sect.", "subf.", "subg.",
"subsp.", "subvar.", "var", "var.", "[infraspec.]", "fo.", "forma",
"nothosubsp.", "nothovar.", "sect."),
spec.name.sub = TRUE,
sub.pattern=c(" sp[.] A", " sp[.] B", " sp[.] C", " sp[.]", " spp[.]", " pl[.]",
" indet[.]", " ind[.]", " gen[.]", " g[.]", " fam[.]", " nov[.]", " prox[.]",
" cf[.]", " aff[.]", " s[.]s[.]", " s[.]l[.]",
" p[.]p[.]", " p[.] p[.]", "[?]", " inc[.]", " stet[.]", "Ca[.]",
"nom[.] cons[.]", "nom[.] dub[.]", " nom[.] err[.]", " nom[.] illeg[.]",
" nom[.] inval[.]", " nom[.] nov[.]", " nom[.] nud[.]", " nom[.] obl[.]",
" nom[.] prot[.]", " nom[.] rej[.]", " nom[.] supp[.]", " sensu auct[.]"),
verbose = TRUE, counter = 1000)
WFO.url(WFO.result = NULL, browse = FALSE, browse.rows = c(1:1), ...)
WFO.one(WFO.result = NULL, priority = "Accepted",
spec.name = NULL, Auth.dist = NULL, First.dist = NULL,
verbose = TRUE, counter = 1000)
WFO.browse(taxon, WFO.file = NULL, WFO.data = NULL,
accepted.only = FALSE, acceptedNameUsageID.match = TRUE, ...)
WFO.synonyms(taxon, WFO.file = NULL, WFO.data = NULL, ...)
WFO.family(taxon, WFO.file = NULL, WFO.data = NULL, ...)
Arguments
spec.data |
A data.frame containing variables with species names. In case that a character vector is provided, then this vector will be converted to a data.frame |
WFO.file |
File name of the static copy of the Taxonomic Backbone. If not |
WFO.data |
Data set with the static copy of the Taxonomic Backbone. Ignored if |
no.dates |
Speeding up the loading of the WFO.data by not loading fields of 'created' and 'modified'. |
spec.name |
Name of the column with taxonomic names. In case that a |
Genus |
Name of the column with the genus names. |
Species |
Name of the column with the species names. |
Infraspecific.rank |
Name of the column with the infraspecific rank (such as "subsp.", "var." or "cultivar."). |
Infraspecific |
Name of the column with the infraspecific names. |
Authorship |
Name of the column with the naming authorities. |
First.dist |
If |
acceptedNameUsageID.match |
If |
Fuzzy |
If larger than 0, then attempt fuzzy matching in case an identifical taxonomic name is not found in the World Flora Online. This argument will be used as argument |
Fuzzy.force |
If |
Fuzzy.max |
Maximum number of fuzzy matches. |
Fuzzy.min |
If |
Fuzzy.shortest |
If |
Fuzzy.within |
If |
Fuzzy.two |
If |
Fuzzy.one |
If |
squish |
If |
spec.name.tolower |
If |
spec.name.nonumber |
If |
spec.name.nobrackets |
If |
exclude.infraspecific |
If |
infraspecific.excluded |
Infraspecific levels (available from column 'verbatimTaxonRank') excluded in the results. Note that levels are excluded both in direct matches and matches with the accepted name. |
spec.name.sub |
If |
sub.pattern |
Sections of the |
verbose |
Give details on the fuzzy matching process. |
counter |
Progress on the matching process is reported by multiples of this counter. |
WFO.result |
Result obtained via WFO.match. |
browse |
If |
browse.rows |
Indices of row with the urls to be browsed. |
priority |
Method of selecting the 1-to-1 matches. Option |
Auth.dist |
In case that the name of the variable with the Levenshtein distance between the authorship names is provided, then the algorithm first prioritizes records with the best match between the submitted and matched author names. |
taxon |
Character string with the name of the taxon for which information will be given (for families, different genera; for genera, different specieds; for species, infraspecific levels). |
accepted.only |
If |
... |
Other arguments for browseURL ( |
Details
The principal function (WFO.match
) matches plant names. Columns retrieved from the World Flora Online are added to the provided input data.frame. In case that there are multiple matches, then rows from the input data.frame are repeated.
Column 'Unique' shows whether there was a unique match (or not match) in the WFO.
Column 'Matched' shows whether there was a match in the WFO.
Column 'Fuzzy' shows whether matching was done by the fuzzy method.
Column 'Fuzzy.dist' gives the Levenshtein distance calculated between submitted and matched plant names adist.
Column 'Auth.dist' gives the Levenshtein distance calculated between submitted and matched authorship names, if the former were provided adist.
Column 'Subseq' gives different numbers for different matches for the same plant name.
Column 'Hybrid' shows whether there was a hybrid character in the scientificName.
Column 'New.accepted' shows whether the species details correspond to the current accepted name.
Column 'Old.status' gives the taxonomic status of the first match with the non-blank acceptedNameUsageID.
Column 'Old.ID' gives the ID of the first match with the non-blank acceptedNameUsageID.
Column 'Old.name' gives the name of the first match with the non-blank acceptedNameUsageID.
The function was inspired on the Taxonstand
package that matches plant names against The Plant List. Note that The Plant List has been static since 2013, but was used as the starting point for the Taxonomic Backbone of the World Flora Online.
Function WFO.one
finds one unique matching name for each submitted name. Via priority = "Accepted"
, it first limits candidates to accepted names, with a possible second step of eliminating accepted names that are synonyms. Via priority = "Synonym"
, it first limits candidates to those that are not synonyms, with a possible second step of eliminating names that are not accepted. When the number of matches is larger than one after these steps, a third algorithm picks the candidate with the smallest taxonID
. When a spec.name
is given to WFO.one
, the original submitted name is inserted for the scientificName
.
When the user specifies the column with the Auth.dist
, documenting the Levenshtein
distance between the submitted and matched authorities, then WFO.one
first prioritizes records with best match between Authorities.
Function WFO.browse
lists all the genera for a family, all species for a genus or all infraspecific levels for a species.
Function WFO.synonyms
gives all records with the acceptedNameUsageID equal to the matched accepted species shown in the first row.
Function WFO.family
provides information on the order of vascular plants, based on information available from vascular.families. Based on an internal list of bryophyte families, when the submitted plant name is a bryophyte, the function returns 'bryophyte' instead.
Value
The main function returns a data.set with the matched species details from the WFO.
Author(s)
Roeland Kindt (World Agroforestry, CIFOR-ICRAF)
References
World Flora Online. An Online Flora of All Known Plants. https://www.worldfloraonline.org
Sigovini M, Keppel E, Tagliapietra. 2016. Open Nomenclature in the biodiversity era. Methods in Ecology and Evolution 7: 1217-1225.
Kindt, R. 2020. WorldFlora: An R package for exact and fuzzy matching of plant names against the World Flora Online taxonomic backbone data. Applications in Plant Sciences 8(9): e11388
See Also
Examples
data(WFO.example)
spec.test <- data.frame(spec.name=c("Faidherbia albida", "Acacia albida",
"Omalanthus populneus", "Pygeum afric"))
WFO.match(spec.data=spec.test, WFO.data=WFO.example, counter=1, verbose=TRUE)
# Also calculate the Levenshtein distance for the genus
WFO.match(spec.data=spec.test, WFO.data=WFO.example, First.dist=TRUE,
counter=1, verbose=TRUE)
# Show all the fuzzy matches, which included those at infraspecifc level
e1 <- WFO.match(spec.data=spec.test, WFO.data=WFO.example, counter=1,
Fuzzy.min=FALSE, Fuzzy.shortest=FALSE, verbose=TRUE)
e1
# Use function WFO.one for a 1-to-1 match between submitted and matched names
WFO.one(e1)
# Hybrid species
WFO.match("Arabis divaricarpa", WFO.data=WFO.example)
WFO.match("Arabis x divaricarpa", WFO.data=WFO.example)
# Convert capitals to lower case
WFO.match("FAIDHERBIA ALBIDA", WFO.data=WFO.example, spec.name.tolower=TRUE)
# Remove sections of plant names that are equal to ' sp.' or ' indet. '
WFO.match("Prunus sp.", WFO.data=WFO.example, spec.name.sub=TRUE)
# Get urls, but do not open any
e2 <- WFO.match(spec.data=spec.test, WFO.data=WFO.example, counter=1, verbose=TRUE)
WFO.url(e2, browse=FALSE, browse.rows=c(1:nrow(e2)))
# Include input species names where no matches were found
# This happens when the name with original species names is provided to WFO.one
x1 <- WFO.match("World agroforestry", WFO.data=WFO.example)
WFO.one(x1, spec.name="spec.name")
## Not run:
# Cross-check with Taxonstand results
library(Taxonstand)
data(bryophytes)
# Give the file with the static copy of the Taxonomic Backbone data ('classification.txt')
# that was downloaded from \url{https://www.worldfloraonline.org/downloadData}.
# Possibly first use unzip(file.choose()) for the downloaded WFO_Backbone.zip
WFO.file.RK <- file.choose()
# check species name
w1 <- WFO.match(bryophytes[1:20, ], WFO.file=WFO.file.RK, spec.name="Full.name", counter=1)
w1
# check species name from list of names
w1 <- WFO.match(bryophytes$Full.name[1:20], WFO.file=WFO.file.RK, counter=1)
# re-check species names obtained via Taxonstand
# note that Taxonstand did not match some infraspecific names ('Higher.level')
r1 <- Taxonstand::TPL(bryophytes$Full.name[1:20], corr = TRUE)
w2 <- WFO.match(r1, WFO.file=WFO.file.RK, Genus="New.Genus", Species="New.Species",
Infraspecific.rank="New.Infraspecific.rank", Infraspecific="New.Infraspecific", counter=1)
w2
# only check genus and species
# specify different names for infraspecific columns as default to Taxonstand
w3 <- WFO.match(r1, WFO.file=WFO.file.RK, Genus="New.Genus", Species="New.Species",
Infraspecific.rank="none", Infraspecific="none", counter=1)
# note that the method above also retrieved infraspecific levels
# to only retrieve at the species level, match infraspecific levels with an empty column
r1$empty <- rep("", nrow(r1))
w4 <- WFO.match(r1, WFO.file=WFO.file.RK, Genus="New.Genus", Species="New.Species",
Infraspecific.rank="empty", Infraspecific="empty", counter=1)
# as an alternative to the method above, exclude all documented infraspecific levels
# from the results
w5 <- WFO.match(r1, WFO.file=WFO.file.RK, Genus="New.Genus", Species="New.Species",
exclude.infraspecific=TRUE, counter=1)
# save results to file
# utils::write.table(w4, quote=F, sep="\t", row.names=F, append=FALSE)
# limit the fuzzy matches to those that contain a shortened version of a species name
w6 <- WFO.match("Acacia caes", WFO.file=WFO.file.RK, Fuzzy=0.01, Fuzzy.within=TRUE, verbose=TRUE)
# show all the matches for a genus
spec.test1 <- data.frame(Genus=c("Casimiroa"))
w8 <- WFO.match(spec.test1, WFO.file=WFO.file.RK, exclude.infraspecific=TRUE, verbose=TRUE)
# show all listings at a next hierarchical level
WFO.data1 <- data.table::fread(WFO.file.RK, encoding="UTF-8")
WFO.browse("Pinaceae", WFO.data=WFO.data1)
WFO.browse("Pinaceae", WFO.data=WFO.data1, accepted.only=T)
WFO.browse("Tsuga", WFO.data=WFO.data1)
WFO.browse("Tsuga", WFO.data=WFO.data1, accepted.only=T)
WFO.browse("Olea europaea", WFO.data=WFO.data1)
WFO.browse("Olea europaea", WFO.data=WFO.data1, accepted.only=T)
# browsing only works at family, genus and species levels
# for orders, however, information is given from vascular.families
WFO.browse("Polypodiales", WFO.data=WFO.data1)
# submitting no name results in a list of all families
WFO.browse(, WFO.data=WFO.data1)
# give synonyms
WFO.synonyms("Olea europaea", WFO.data=WFO.data1)
# give order and other higher levels from family
WFO.family("Olea europaea", WFO.data=WFO.data1)
## End(Not run)