subset_query {corpustools}R Documentation

Subset tCorpus token data using a query

Description

A convenience function that searches for contexts (documents, sentences), and uses the results to subset the tCorpus token data.

Usage

subset_query(
  tc,
  query,
  feature = "token",
  context_level = c("document", "sentence"),
  not = F,
  as_ascii = F,
  window = NA
)

Arguments

tc

A tCorpus

query

A character string that is a query. See search_contexts for query syntax.

feature

The name of the feature columns on which the query is used.

context_level

Select whether the query and subset are performed at the document or sentence level.

not

If TRUE, perform a NOT search. Return the articles/sentences for which the query is not found.

as_ascii

if TRUE, perform search in ascii.

window

If used, uses a word distance as the context (overrides context_level)

Details

See the documentation for search_contexts for an explanation of the query language.

Examples

text = c('A B C', 'D E F. G H I', 'A D', 'GGG')
tc = create_tcorpus(text, doc_id = c('a','b','c','d'), split_sentences = TRUE)

## subset by reference
tc2 = subset_query(tc, 'A')
tc2$meta


[Package corpustools version 0.5.1 Index]