subset_corpus {R.temis}R Documentation

subset_corpus

Description

Select documents containing (or not containing) one or more terms.

Usage

subset_corpus(corpus, dtm, terms, exclude = FALSE, all = FALSE)

Arguments

corpus

A Corpus object.

dtm

A DocumentTermMatrix object corresponding to corpus.

terms

One of more terms appearing in dtm.

exclude

Whether documents containing the terms should be excluded rather than retained.

all

Whether only documents containing all terms should be retained or excluded. By default, documents need to contain at least one of the terms.

Value

Corpus object.

Examples


file <- system.file("texts", "reut21578-factiva.xml", package="tm.plugin.factiva")
corpus <- import_corpus(file, "factiva", language="en")
dtm <- build_dtm(corpus)
subset_corpus(corpus, dtm, "barrel")
subset_corpus(corpus, dtm, c("barrel", "opec"))
subset_corpus(corpus, dtm, c("barrel", "opec"), exclude=TRUE)
subset_corpus(corpus, dtm, c("barrel", "opec"), all=TRUE)


[Package R.temis version 0.1.3 Index]