R: Detect/Locate Potential Non-Normalized Text

which_are {textclean}

R Documentation

Detect/Locate Potential Non-Normalized Text

Description

Detect/Locate potential issues with text data. This family of functions generates a list of detections/location functions that can be accessed via the dollar sign or square bracket operators. Accessible functions include:

Usage

which_are()

is_it()

Details

contraction: Contains contractions
date: Contains dates
digit: Contains digits
email: Contains email addresses
emoticon: Contains emoticons
empty: Contains just white space
escaped: Contains escaped backslash character
hash: Contains Twitter style hash tags
html: Contains html mark-up
incomplete: Contains incomplete sentences (e.g., ends with ...)
kern: Contains kerning (e.g. "The B O M B!")
list_column: Is a list of atomic vectors (Not provided by which_are))
misspelled: Contains potentially misspelled words
no_endmark: Contains a sentence with no ending punctuation
no_space_after_comma: Contains commas with no space after them
non_ascii: Contains non-ASCII characters
non_character: Is a non-character vector (Not provided by which_are))
non_split_sentence: Contains non split sentences
tag: Contains a Twitter style handle used to tag others (use of the at symbol)
time: Contains a time stamp
url: Contains a URL

The functions above that have a description starting with 'is' rather than 'contains' are meta functions that describe the attribute of the column/vector being passed rather than attributes about the individual elements of the column/vector. The meta functions will return a logical of length one and are not available under which_are.

Value

which_are returns an environment of functions that can be used to locate and return the integer locations of the particular non-normalized text named by the function.

is_it returns an environment of functions that can be used to detect and return a logical atomic vector of equal length to the input vector (except for meta functions) of the particular non-normalized text named by the function.

Examples

wa <- which_are()
it <- is_it()
wa$digit(c('The dog',  "I like 2", NA))
it$digit(c('The dog',  "I like 2", NA))

is_it()$list_column(c('the dog', 'ate the chicken'))

[Package textclean version 0.9.3 Index]