as.text.table |
Convert a data.table column of character vectors into a column with one row per word grouped by a grouping column. Optionally will split a column of strings into vectors of constituents. |
flag_words |
Flag rows in a text.table with specific words |
label_parts_of_speech |
Add a column with the parts of speech for each word in a text.table |
l_pos |
Parts of speech for English words from the Moby Project. |
ngrams |
Create n-grams |
pos |
Parts of speech for English words from the Moby Project. |
regex_paragraph |
Regular expression that might be used to split strings of text into component paragraphs. |
regex_sentence |
Regular expression that might be used to split strings of text into component sentences. |
regex_word |
Regular expression that might be used to split strings of text into component words. |
rm_frequent_words |
Delete rows in a text.table where the number of identical records within a group is more than a certain threshold |
rm_infrequent_words |
Delete rows in a text.table where the number of identical records within a group is less than a certain threshold |
rm_long_words |
Delete rows in a text.table where the word has more than a minimum number of characters |
rm_no_overlap |
Delete rows in a text.table where the records within a group are not also found in other groups (overlapping records) |
rm_overlap |
Delete rows in a text.table where the records within a group are also found in other groups (overlapping records) |
rm_parts_of_speech |
Delete rows in a text.table where the word has a certain part of speech |
rm_regexp_match |
Delete rows in a text.table where the record has a certain pattern indicated by a regular expression |
rm_short_words |
Delete rows in a text.table where the word has less than a minimum number of characters |
rm_words |
Remove rows from a text.table with specific words |
sampleStr |
Generates (pseudo)random strings of the specified char length |
stopwords |
Vector of lowercase English stop words. |
str_any_match |
Detect if there are any words in a vector also found in another vector. |
str_counts |
Create a list of a vector of unique words found in x and a vector of the counts of each word in x. |
str_count_intersect |
Count the intersecting words in a vector that are found in another vector (only counts unique words). |
str_count_jaccard_similarity |
Calculates the intersect divided by union of two vectors of words. |
str_count_match |
Count the words in a vector that are found in another vector. |
str_count_nomatch |
Count the words in a vector that are not found in another vector. |
str_count_positional_match |
Count words from a vector that are found in the same position in another vector. |
str_count_positional_nomatch |
Count words from a vector that are not found in the same position in another vector. |
str_count_setdiff |
Count the words in a vector that don't intersect with another vector (only counts unique words). |
str_dt_col_combine |
Combine columns of a data.table into a list in a new column, wraps list(unlist(c(...))) |
str_extract_match |
Extract words from a vector that are found in another vector. |
str_extract_nomatch |
Extract words from a vector that are not found in another vector. |
str_extract_positional_match |
Extract words from a vector that are found in the same position in another vector. |
str_extract_positional_nomatch |
Extract words from a vector that are not found in the same position in another vector. |
str_rm_blank_space |
Remove and replace excess white space from strings. |
str_rm_long_words |
Remove words from a vector that have more than a maximum number of characters. |
str_rm_non_alphanumeric |
Remove and replace non-alphanumeric characters from strings. |
str_rm_non_printable |
Remove and replace non-printable characters from strings. |
str_rm_numbers |
Remove and replace numbers from strings. |
str_rm_punctuation |
Remove and replace punctuation from strings. |
str_rm_regexp_match |
Remove words from a vector that match a regular expression. |
str_rm_short_words |
Remove words from a vector that don't have a minimum number of characters. |
str_rm_words |
Remove words from a vector of words found in another vector of words. |
str_rm_words_by_length |
Remove words from a vector based on the number of characters in each word. |
str_stopwords_by_part_of_speech |
Create a vector of English words associated with particular parts of speech. |
str_tolower |
Calls base::tolower(), which converts letters to lowercase. Only included to point out that base::tolower exists and should be used directly. |
str_weighted_count_match |
Weighted count of the words in a vector that are found in another vector. |