subset.tCorpus {corpustools}R Documentation

S3 subset for tCorpus class

Description

S3 subset for tCorpus class

Usage

## S3 method for class 'tCorpus'
subset(x, subset = NULL, subset_meta = NULL, window = NULL, ...)

Arguments

x

a tCorpus object

subset

logical expression indicating rows to keep in the tokens data.

subset_meta

logical expression indicating rows to keep in the document meta data.

window

If not NULL, an integer specifiying the window to be used to return the subset. For instance, if the subset contains token 10 in a document and window is 5, the subset will contain token 5 to 15. Naturally, this does not apply to subset_meta.

...

not used

Examples

## create tcorpus of 5 bush and obama docs
tc = create_tcorpus(sotu_texts[c(1:5,801:805),], doc_col='id')

## subset to keep only tokens where token_id <= 20 (i.e.first 20 tokens)
tcs1 = subset(tc, token_id < 20)
tcs1

## subset to keep only documents where president is Barack Obama
tcs2 = subset(tc, subset_meta = president == 'Barack Obama')
tcs2

[Package corpustools version 0.5.1 Index]