R: Fold rsyntax annotations

fold_rsyntax {corpustools}

R Documentation

Fold rsyntax annotations

Description

If a tCorpus has rsyntax annotations (see annotate_rsyntax), it can be convenient to aggregate tokens that have a certain semantic label. For example, if you have a query for labeling "source" and "quote", you can add an aggegated value for the sources (such as a unique ID) as a column, and then remove the quote tokens.

Usage

fold_rsyntax(tc, annotation, by_label, ..., txt = F, rm_by = T)

Arguments

`tc`	A tCorpus
`annotation`	The name of an rsyntax annotation column
`by_label`	The labels in this column for which you want to aggregate the tokens
`...`	Specify the new aggregated columns in name-value pairs. The name is the name of the new column, and the value should be a function over a column in $tokens. For example: subject = paste(token, collapse = ' ') would create the column 'subject', of which the values are the concatenated tokens. See examples for more.
`txt`	If TRUE, add _txt column with concatenated tokens for by_label.
`rm_by`	If TRUE (default), remove the column(s) specified in by_label

Value

a transformed tCorpus

Examples

tc = tc_sotu_udpipe$copy()
tc$udpipe_clauses()

fold_rsyntax(tc, 'clause', by_label = 'subject', subject = paste(token, collapse=' '))

[Package corpustools version 0.5.1 Index]