R: Report Words Most Associated with each Treatment

sibp_top_words {texteffect}

R Documentation

Report Words Most Associated with each Treatment

Description

sibp_top_words returns a data frame of the words most associated with each treatment.

Usage

	  sibp_top_words(sibp.fit, words, num.words = 10, verbose = FALSE)

Arguments

`sibp.fit`	A `sibp` object.
`words`	The actual words, usually obtained through colnames(X).
`num.words`	The number of top words to report.
`verbose`	If set to true, reports how common each treatment is (so that the analyst can focus on the common treatments) and how closely associated each word is with each treatment.

Details

The choice of the parameter configuration should always be made on the basis of which treatments are substantively the most interesting. This function provides one natural way of discovering which words are most associated with each treatment (the mean parameter for the posterior distribution of phi, where phi is the effect of the treatment on the count of word w) and therefore helps to determine which treatments are most interesting.

Value

top.words

A data frame where each column consists of the top ten words (in order) associated with a given treatment.

Author(s)

Christian Fong

References

Fong, Christian and Justin Grimmer. 2016. “Discovery of Treatments from Text Corpora” Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. https://aclweb.org/anthology/P/P16/P16-1151.pdf

Examples

##Load the Wikipedia biography data
data(BioSample)

# Divide into training and test sets
Y <- BioSample[,1]
X <- BioSample[,-1]
set.seed(1)
train.ind <- sample(1:nrow(X), size = 0.5*nrow(X), replace = FALSE)

# Fit an sIBP on the training data
sibp.fit <- sibp(X, Y, K = 2, alpha = 4, sigmasq.n = 0.8, 
				 train.ind = train.ind)

sibp_top_words(sibp.fit, colnames(X))

[Package texteffect version 0.3 Index]