congress109 {textir}R Documentation

Ideology in Political Speeches

Description

Phrase counts and ideology scores by speaker for members of the 109th US congress.

Details

This data originally appear in Gentzkow and Shapiro (GS; 2010) and considers text of the 2005 Congressional Record, containing all speeches in that year for members of the United States House and Senate. In particular, GS record the number times each of 529 legislators used terms in a list of 1000 phrases (i.e., each document is a year of transcripts for a single speaker). Associated sentiments are repshare – the two-party vote-share from each speaker's constituency (congressional district for representatives; state for senators) obtained by George W. Bush in the 2004 presidential election – and the speaker's first and second common-score values (from http://voteview.com). Full parsing and sentiment details are in Taddy (2013; Section 2.1).

Value

congress109Counts

A dgCMatrix of phrase counts indexed by speaker-rows and phrase-columns.

congress109Ideology

A data.frame containing the associated repshare and common scores [cs1,cs2], as well as speaker characteristics: party (‘R’epublican, ‘D’emocrat, or ‘I’ndependent), state, and chamber (‘H’ouse or ‘S’enate).

Author(s)

Matt Taddy, mataddy@gmail.com

References

Gentzkow, M. and J. Shapiro (2010), What drives media slant? Evidence from U.S. daily newspapers. Econometrica 78, 35-7. The full dataset is at http://dx.doi.org/10.3886/ICPSR26242.

Taddy (2013), Multinomial Inverse Regression for Text Analysis. http://arxiv.org/abs/1012.2098

See Also

srproj, pls, dmr, we8there

Examples

data(congress109)

## Bivariate sentiment factors (roll-call vote common scores)
covars <- data.frame(gop=congress109Ideology$party=="R",
					cscore=congress109Ideology$cs1)
covars$cscore <- covars$cscore - 
	tapply(covars$cscore,covars$gop,mean)[covars$gop+1]
rownames(covars) <- rownames(congress109Ideology)

## cl=NULL implies a serial run. 
## To use a parallel library fork cluster, 
## uncomment the relevant lines below. 
## Forking is unix only; use PSOCK for windows
cl <- NULL
# cl <- makeCluster(detectCores(), type="FORK")
## small nlambda for a fast example
fitCS <- dmr(cl, covars, congress109Counts, gamma=1, nlambda=10)
# stopCluster(cl)

## plot the fit
par(mfrow=c(1,2))
for(j in c("estate.tax","death.tax")){
	plot(fitCS[[j]], col=c("red","green"))
	mtext(j,line=2) }
legend("topright",bty="n",fill=c("red","green"),legend=names(covars))


## plot the IR sufficient reduction space
Z <- srproj(fitCS, congress109Counts)
par(mfrow=c(1,1))
plot(Z, pch=21, bg=c(4,3,2)[congress109Ideology$party], main="SR projections")
## two pols
Z[c(68,388),]

[Package textir version 2.0-5 Index]