R: Tokenized press conferences

ECB_press_conferences_tokens {sentopics}

R Documentation

Tokenized press conferences

Description

The pre-processed and tokenized version of the ECB_press_conferences corpus of press conferences. The processing involved the following steps:

Subset paragraphs shorter than 10 words
Removal of stop words
Part-of-speech tagging, following which only nouns, proper nouns and adjective were retained.
Detection and merging of frequent compound words
Frequency-based cleaning of rare and very common words

Usage

ECB_press_conferences_tokens

Format

A quanteda::tokens object.

Source

https://www.ecb.europa.eu/press/key/date/html/index.en.html.

See Also

ECB_press_conferences

Examples

LDA(ECB_press_conferences_tokens)

[Package sentopics version 0.7.3 Index]