clean_name {fossilbrush}R Documentation

roxygen documentation

Description

clean_name

Usage

clean_name(x, terms = NULL, collapse = NULL, verbose = FALSE)

Arguments

x

a vector of names to clean. This will be coerced to class character internally

terms

a character vector of terms to remove from elements of x. Terms are only removed as whole words, rather than if they also happen to occur as strings within elements of x

collapse

a character vector of strings which should collapsed (i.e. replaced by "", rather than the default " "). If one of the collapse terms is a special regex character, it will need to be escaped, e.g. "\-"

verbose

A logical of length 1 determining if function progress should be reported to the console

Details

Function which bundles a series of cleaning routines into a single process. First any words in brackets are removed, followed by a series of user-defined terms if given. Next Roman and Arabic numerical are removed, then abbreviations up to five letters (abbreviations are matched by the following dot e.g ABFS.). By default, characters for removal are replaced by a white space to prevent accidental collapse of strings. However, there may be specific cases where a collapse is required and so terms given in collapse are dealt with next. After collapsing, rogue all rogue punctation is removed, then isolated lowercase letters, then isolated groups of capitals up to 5 characters long. Finally, white spaces greater than 1 are removed, along with trailing white space, any remaining strings longer than 2 words subsetted to the first word, the first letter of each string capitalised and zero length strings converted to NA

Value

a character vector the same length as x. Elements which were reduced to zero characters during cleaning are returned as NA

Examples

# load dataset
data("brachios")
# clean genus names
gen_clean <- clean_name(brachios$genus)

[Package fossilbrush version 1.0.3 Index]