R: Match names that start or contain a specified text string

fuzzy_filter {taxadb}

R Documentation

Match names that start or contain a specified text string

Description

Match names that start or contain a specified text string

Usage

fuzzy_filter(
  name,
  by = c("scientificName", "vernacularName"),
  provider = getOption("taxadb_default_provider", "itis"),
  match = c("contains", "starts_with"),
  version = latest_version(),
  db = td_connect(),
  ignore_case = TRUE,
  collect = TRUE
)

Arguments

`name`	vector of names (scientific or common, see `by`) to be matched against.
`by`	a column name in the taxa_tbl (following Darwin Core Schema terms). The filtering join is executed with this column as the joining variable.
`provider`	from which provider should the hierarchy be returned? Default is 'itis', which can also be configured using `⁠options(default_taxadb_provider=...")⁠`. See `⁠[td_create]⁠` for a list of recognized providers.
`match`	should we match by names starting with the term or containing the term anywhere in the name?
`version`	Which version of the taxadb provider database should we use? defaults to latest. See tl_import for details.
`db`	a connection to the taxadb database. See details.
`ignore_case`	should we ignore case (capitalization) in matching names? Can be significantly slower to run.
`collect`	logical, default `TRUE`. Should we return an in-memory data.frame (default, usually the most convenient), or a reference to lazy-eval table on disk (useful for very large tables on which we may first perform subsequent filtering operations.)

Details

Note that fuzzy filter will be fast with an single or small number of names, but will be slower if given a very large vector of names to match, as unlike other filter_ commands, fuzzy matching requires separate SQL calls for each name. As fuzzy matches should all be confirmed manually in any event, e.g. not every common name containing "monkey" belongs to a primate species.

This method utilizes the database operation ⁠%like%⁠ to filter tables without loading into memory. Note that this does not support the use of regular expressions at this time.

Examples


  

## match any common name containing:
name <- c("woodpecker", "monkey")
fuzzy_filter(name, "vernacularName")

## match scientific name
fuzzy_filter("Chera", "scientificName",
             match = "starts_with")

[Package taxadb version 0.2.1 Index]