which_are {textclean} | R Documentation |
Detect/Locate Potential Non-Normalized Text
Description
Detect/Locate potential issues with text data. This family of functions generates a list of detections/location functions that can be accessed via the dollar sign or square bracket operators. Accessible functions include:
Usage
which_are()
is_it()
Details
- contraction
Contains contractions
- date
Contains dates
- digit
Contains digits
Contains email addresses
- emoticon
Contains emoticons
- empty
Contains just white space
- escaped
Contains escaped backslash character
- hash
Contains Twitter style hash tags
- html
Contains html mark-up
- incomplete
Contains incomplete sentences (e.g., ends with ...)
- kern
Contains kerning (e.g. "The B O M B!")
- list_column
Is a list of atomic vectors (Not provided by
which_are
))- misspelled
Contains potentially misspelled words
- no_endmark
Contains a sentence with no ending punctuation
- no_space_after_comma
Contains commas with no space after them
- non_ascii
Contains non-ASCII characters
- non_character
Is a non-character vector (Not provided by
which_are
))- non_split_sentence
Contains non split sentences
- tag
Contains a Twitter style handle used to tag others (use of the at symbol)
- time
Contains a time stamp
- url
Contains a URL
The functions above that have a description starting with 'is' rather than 'contains'
are meta functions that describe the attribute of the column/vector being passed
rather than attributes about the individual elements of the column/vector. The
meta functions will return a logical of length one and are not available under
which_are
.
Value
which_are
returns an environment of functions that can be used to
locate and return the integer locations of the particular non-normalized text
named by the function.
is_it
returns an environment of functions that can be used to
detect and return a logical atomic vector of equal length to the input vector
(except for meta functions) of the particular non-normalized text
named by the function.
Examples
wa <- which_are()
it <- is_it()
wa$digit(c('The dog', "I like 2", NA))
it$digit(c('The dog', "I like 2", NA))
is_it()$list_column(c('the dog', 'ate the chicken'))