ntoken {quanteda} | R Documentation |
Count the number of tokens or types
Description
Get the count of tokens (total features) or types (unique tokens).
Usage
ntoken(x, ...)
ntype(x, ...)
Arguments
x |
|
... |
additional arguments passed to |
Value
ntoken()
returns a named integer vector of the counts of the total
tokens
ntypes()
returns a named integer vector of the counts of the types (unique
tokens) per document. For dfm objects, ntype()
will only return the
count of features that occur more than zero times in the dfm.
Examples
# simple example
txt <- c(text1 = "This is a sentence, this.", text2 = "A word. Repeated repeated.")
toks <- tokens(txt)
ntoken(toks)
ntype(toks)
ntoken(tokens_tolower(toks)) # same
ntype(tokens_tolower(toks)) # fewer types
# with some real texts
toks <- tokens(corpus_subset(data_corpus_inaugural, Year < 1806))
ntoken(tokens(toks, remove_punct = TRUE))
ntype(tokens(toks, remove_punct = TRUE))
ntoken(dfm(toks))
ntype(dfm(toks))
[Package quanteda version 4.0.2 Index]