data_corpus_moviereviews {quanteda.textmodels} | R Documentation |
Movie reviews with polarity from Pang and Lee (2004)
Description
A corpus object containing 2,000 movie reviews classified by positive or negative sentiment.
Usage
data_corpus_moviereviews
Format
The corpus includes the following document variables:
- sentiment
factor indicating whether a review was manually classified as positive
pos
or negativeneg
.- id1
Character counting the position in the corpus.
- id2
Random number for each review.
Details
For more information, see cat(meta(data_corpus_moviereviews, "readme"))
.
Source
https://www.cs.cornell.edu/people/pabo/movie-review-data/
References
Pang, B., Lee, L. (2004) "A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts.", Proceedings of the ACL.
Examples
# check polarities
table(data_corpus_moviereviews$sentiment)
# make the data into sentences, because each line is a sentence
data_corpus_moviereviewsents <-
quanteda::corpus_segment(data_corpus_moviereviews, "\n", extract_pattern = FALSE)
print(data_corpus_moviereviewsents, max_ndoc = 3)
[Package quanteda.textmodels version 0.9.7 Index]