cr_sample_corpus {conText} | R Documentation |
Congressional Record sample corpus
Description
A (quanteda) corpus containing a sample of the United States Congressional Record (daily transcripts) covering the 111th to 114th Congresses. The raw corpus is first subset to speeches containing the regular expression "immig*". Then 100 docs from each party-gender pair is randomly sampled. For full data and pre-processing file, see: https://www.dropbox.com/sh/jsyrag7opfo7l7i/AAB1z7tumLuKihGu2-FDmhmKa?dl=0 For nominate scores see: https://voteview.com/data
Usage
cr_sample_corpus
Format
A quanteda corpus with 200 documents and 3 docvars:
- party
party of speaker, (D)emocrat or (R)epublican
- gender
gender of speaker, (F)emale or (M)ale
- nominate_dim1
dimension 1 of the nominate score
...
Source
https://data.stanford.edu/congress_text
[Package conText version 1.4.3 Index]