R: Movie reviews with polarity from Pang and Lee (2004)

data_corpus_moviereviews {quanteda.textmodels}

R Documentation

Movie reviews with polarity from Pang and Lee (2004)

Description

A corpus object containing 2,000 movie reviews classified by positive or negative sentiment.

Usage

data_corpus_moviereviews

Format

The corpus includes the following document variables:

sentiment: factor indicating whether a review was manually classified as positive pos or negative neg.
id1: Character counting the position in the corpus.
id2: Random number for each review.

Details

For more information, see cat(meta(data_corpus_moviereviews, "readme")).

Source

https://www.cs.cornell.edu/people/pabo/movie-review-data/

References

Pang, B., Lee, L. (2004) "A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts.", Proceedings of the ACL.

Examples

# check polarities
table(data_corpus_moviereviews$sentiment)

# make the data into sentences, because each line is a sentence
data_corpus_moviereviewsents <-
    quanteda::corpus_segment(data_corpus_moviereviews, "\n", extract_pattern = FALSE)
print(data_corpus_moviereviewsents, max_ndoc = 3)

[Package quanteda.textmodels version 0.9.7 Index]