R: Add columns indicating who said what

tCorpus$udpipe_quotes {corpustools}

R Documentation

Add columns indicating who said what

Description

An off-the-shelf application of rsyntax for extracting quotes. Designed for working with a tCorpus created with udpipe_tcorpus.

Arguments

`tqueries`	A list of tqueries. By default uses the off-the-shelf tqueries in `udpipe_quote_tqueries`.
`span_tqueries`	Additional tqueries for finding candidates for 'span quotes' (i.e. quotes that span multiple sentences, indicated by quotation marks). By default uses the off-the-shelf tqueries in `udpipe_spanquote_tqueries`.

Details

Default tqueries are provided for detecting source-quote relations within sentences (udpipe_quote_tqueries), and for detecting source candidates for text between quotation marks that can span across multiple sentences (udpipe_spanquote_tqueries). These have mainly been developed and tested for the english-ewt udpipe model.

There are two ways to customize this function. One is to specify a custom character vector of verb lemma. This vector should then be passed as an argument to the two functions for generarting the default tqueries. The second (more advanced) way is to provide a custom list of tqueries. (Note that the udpipe_quote_tqueries and udpipe_spanquote_tqueries functions simply create lists of queries. You can create new lists, or add tqueries to these lists). !! If you create custom tqueries, make sure that the labels for the quote and source tokens are 'source' and 'quote'. For the spanquote_tqueries, the label for the source candidate should be 'source'.

Value

the columns 'quote', 'quote_id', and 'quote_verbatim' are added to tokens

Examples

## Not run: 
txt = 'Bob said that he likes Mary. John did not like that: 
       "how dare he!". "It is I, John, who likes Mary!!"'
tc = udpipe_tcorpus(txt, model = 'english-ewt')
tc$udpipe_quotes()

if (interactive()) {
  tc_plot_tree(tc, token, lemma, POS, annotation='quote')
  browse_texts(tc, rsyntax='quote', value='source')
}

## you can provide your own lists of tqueries, or use the two 
## query generating functions to customize the specific 'verb lemma'
## (i.e. the lemma for verbs that indicate speech)

custom_verb_lemma = c('say','state')   ## this should be longer
quote_tqueries =      udpipe_quote_tqueries(custom_verb_lemma)
span_quote_tqueries = udpipe_spanquote_tqueries(custom_verb_lemma)

## note that these use simply lists with tqueries, so you can also
## create your own list or customize these lists

quote_tqueries
span_quote_tqueries

if (interactive()) {
tc$udpipe_quotes(tqueries = quote_tqueries, span_tqueries = span_quote_tqueries)
tc_plot_tree(tc, token, lemma, POS, annotation='quote')
browse_texts(tc, rsyntax='quote', value='source')
}

## End(Not run)

[Package corpustools version 0.5.1 Index]