R: Subset tCorpus token data using a query

tCorpus$subset_query {corpustools}

R Documentation

Subset tCorpus token data using a query

Description

A convenience function that searches for contexts (documents, sentences), and uses the results to subset the tCorpus token data.

See the documentation for search_contexts for an explanation of the query language.

Usage:

## R6 method for class tCorpus. Use as tc$method (where tc is a tCorpus object).

subset_query(query, feature = 'token', context_level = c('document','sentence','window'))

Arguments

`query`	A character string that is a query. See search_contexts for query syntax.
`feature`	The name of the feature columns on which the query is used.
`context_level`	Select whether the query and subset are performed at the document or sentence level.
`window`	If used, uses a word distance as the context (overrides context_level)
`as_ascii`	if TRUE, perform search in ascii.
`not`	If TRUE, perform a NOT search. Return the articles/sentences for which the query is not found.
`copy`	If TRUE, return modified copy of data instead of subsetting the input tcorpus by reference.

Examples

text = c('A B C', 'D E F. G H I', 'A D', 'GGG')
tc = create_tcorpus(text, doc_id = c('a','b','c','d'), split_sentences = TRUE)

## subset by reference
tc$subset_query('A')
tc$meta

## using copy mechanic
class(tc$tokens$doc_id)
tc2 = tc$subset_query('A AND D', copy=TRUE)

tc2$get_meta()

tc$meta ## (unchanged)

[Package corpustools version 0.5.1 Index]