ECB_press_conferences_tokens {sentopics} | R Documentation |
Tokenized press conferences
Description
The pre-processed and tokenized version of the ECB_press_conferences corpus of press conferences. The processing involved the following steps:
Subset paragraphs shorter than 10 words
Removal of stop words
Part-of-speech tagging, following which only nouns, proper nouns and adjective were retained.
Detection and merging of frequent compound words
Frequency-based cleaning of rare and very common words
Usage
ECB_press_conferences_tokens
Format
A quanteda::tokens object.
Source
https://www.ecb.europa.eu/press/key/date/html/index.en.html.
See Also
Examples
LDA(ECB_press_conferences_tokens)
[Package sentopics version 0.7.3 Index]