get_ngrams {discoverableresearch}R Documentation

Extract n-grams from text

Description

This function extracts n-grams from text.

Usage

get_ngrams(
  x,
  n = 2,
  min_freq = 1,
  ngram_quantile = NULL,
  stop_words,
  rm_punctuation = FALSE,
  preserve_chars = c("-", "_"),
  language = "English"
)

Arguments

x

A character vector from which to extract n-grams.

n

Numeric: the minimum number of terms in an n-gram.

min_freq

Numeric: the minimum number of times an n-gram must occur to be returned.

ngram_quantile

Numeric: what quantile of ngrams should be retained. Defaults to 0.8; i.e. the 80th percentile of ngram frequencies.

stop_words

A character vector of stopwords to ignore.

rm_punctuation

Logical: should punctuation be removed before selecting ngrams?

preserve_chars

A character vector of punctuation marks to be retained if rm_punctuation is TRUE.

language

A string indicating the language to use for removing stopwords.

Value

A character vector of n-grams.

Examples

get_ngrams("On the Origin of Species By Means of Natural Selection")

[Package discoverableresearch version 0.0.1 Index]