txt_nextgram {udpipe} | R Documentation |
Based on a vector with a word sequence, get n-grams (looking forward)
Description
If you have annotated your text using udpipe_annotate
,
your text is tokenised in a sequence of words. Based on this vector of words in sequence
getting n-grams comes down to looking at the next word and the subsequent word andsoforth.
These words can be pasted
together to form an n-gram containing
the current word, the next word up, the subsequent word, ...
Usage
txt_nextgram(x, n = 2, sep = " ")
Arguments
x |
a character vector where each element is just 1 term or word |
n |
an integer indicating the ngram. Values of 1 will keep the x, a value of 2 will append the next term to the current term, a value of 3 will append the subsequent term and the term following that term to the current term |
sep |
a character element indicating how to |
Value
a character vector of the same length of x
with the n-grams
See Also
Examples
x <- sprintf("%s%s", LETTERS, 1:26)
txt_nextgram(x, n = 2)
data.frame(words = x,
bigram = txt_nextgram(x, n = 2),
trigram = txt_nextgram(x, n = 3, sep = "-"),
quatrogram = txt_nextgram(x, n = 4, sep = ""),
stringsAsFactors = FALSE)
x <- c("A1", "A2", "A3", NA, "A4", "A5")
data.frame(x,
bigram = txt_nextgram(x, n = 2, sep = "_"),
stringsAsFactors = FALSE)