textstat_summary {quanteda.textstats}R Documentation

Summarize documents as syntactic and lexical feature counts

Description

Count syntactic and lexical features of documents such as tokens, types, sentences, and character categories.

Usage

textstat_summary(x, ...)

Arguments

x

corpus to be summarized

...

additional arguments passed through to dfm()

Details

Count the total number of characters, tokens and sentences as well as special tokens such as numbers, punctuation marks, symbols, tags and emojis.

Examples

if (Sys.info()["sysname"] != "SunOS") {
library("quanteda")
corp <- data_corpus_inaugural[1:5]
textstat_summary(corp)
toks <- tokens(corp)
textstat_summary(toks)
dfmat <- dfm(toks)
textstat_summary(dfmat)
}

[Package quanteda.textstats version 0.97 Index]