get_ngrams {discoverableresearch} | R Documentation |
Extract n-grams from text
Description
This function extracts n-grams from text.
Usage
get_ngrams(
x,
n = 2,
min_freq = 1,
ngram_quantile = NULL,
stop_words,
rm_punctuation = FALSE,
preserve_chars = c("-", "_"),
language = "English"
)
Arguments
x |
A character vector from which to extract n-grams. |
n |
Numeric: the minimum number of terms in an n-gram. |
min_freq |
Numeric: the minimum number of times an n-gram must occur to be returned. |
ngram_quantile |
Numeric: what quantile of ngrams should be retained. Defaults to 0.8; i.e. the 80th percentile of ngram frequencies. |
stop_words |
A character vector of stopwords to ignore. |
rm_punctuation |
Logical: should punctuation be removed before selecting ngrams? |
preserve_chars |
A character vector of punctuation marks to be retained if rm_punctuation is TRUE. |
language |
A string indicating the language to use for removing stopwords. |
Value
A character vector of n-grams.
Examples
get_ngrams("On the Origin of Species By Means of Natural Selection")
[Package discoverableresearch version 0.0.1 Index]