R: Extract n-grams from text

get_ngrams {discoverableresearch}

R Documentation

Extract n-grams from text

Description

This function extracts n-grams from text.

Usage

get_ngrams(
  x,
  n = 2,
  min_freq = 1,
  ngram_quantile = NULL,
  stop_words,
  rm_punctuation = FALSE,
  preserve_chars = c("-", "_"),
  language = "English"
)

Arguments

`x`	A character vector from which to extract n-grams.
`n`	Numeric: the minimum number of terms in an n-gram.
`min_freq`	Numeric: the minimum number of times an n-gram must occur to be returned.
`ngram_quantile`	Numeric: what quantile of ngrams should be retained. Defaults to 0.8; i.e. the 80th percentile of ngram frequencies.
`stop_words`	A character vector of stopwords to ignore.
`rm_punctuation`	Logical: should punctuation be removed before selecting ngrams?
`preserve_chars`	A character vector of punctuation marks to be retained if rm_punctuation is TRUE.
`language`	A string indicating the language to use for removing stopwords.

Value

A character vector of n-grams.

Examples

get_ngrams("On the Origin of Species By Means of Natural Selection")

[Package discoverableresearch version 0.0.1 Index]