tCorpus$udpipe_quotes {corpustools}R Documentation

Add columns indicating who said what


An off-the-shelf application of rsyntax for extracting quotes. Designed for working with a tCorpus created with udpipe_tcorpus.



A list of tqueries. By default uses the off-the-shelf tqueries in udpipe_quote_tqueries.


Additional tqueries for finding candidates for 'span quotes' (i.e. quotes that span multiple sentences, indicated by quotation marks). By default uses the off-the-shelf tqueries in udpipe_spanquote_tqueries.


Default tqueries are provided for detecting source-quote relations within sentences (udpipe_quote_tqueries), and for detecting source candidates for text between quotation marks that can span across multiple sentences (udpipe_spanquote_tqueries). These have mainly been developed and tested for the english-ewt udpipe model.

There are two ways to customize this function. One is to specify a custom character vector of verb lemma. This vector should then be passed as an argument to the two functions for generarting the default tqueries. The second (more advanced) way is to provide a custom list of tqueries. (Note that the udpipe_quote_tqueries and udpipe_spanquote_tqueries functions simply create lists of queries. You can create new lists, or add tqueries to these lists). !! If you create custom tqueries, make sure that the labels for the quote and source tokens are 'source' and 'quote'. For the spanquote_tqueries, the label for the source candidate should be 'source'.


the columns 'quote', 'quote_id', and 'quote_verbatim' are added to tokens


## Not run: 
txt = 'Bob said that he likes Mary. John did not like that: 
       "how dare he!". "It is I, John, who likes Mary!!"'
tc = udpipe_tcorpus(txt, model = 'english-ewt')

if (interactive()) {
  tc_plot_tree(tc, token, lemma, POS, annotation='quote')
  browse_texts(tc, rsyntax='quote', value='source')

## you can provide your own lists of tqueries, or use the two 
## query generating functions to customize the specific 'verb lemma'
## (i.e. the lemma for verbs that indicate speech)

custom_verb_lemma = c('say','state')   ## this should be longer
quote_tqueries =      udpipe_quote_tqueries(custom_verb_lemma)
span_quote_tqueries = udpipe_spanquote_tqueries(custom_verb_lemma)

## note that these use simply lists with tqueries, so you can also
## create your own list or customize these lists


if (interactive()) {
tc$udpipe_quotes(tqueries = quote_tqueries, span_tqueries = span_quote_tqueries)
tc_plot_tree(tc, token, lemma, POS, annotation='quote')
browse_texts(tc, rsyntax='quote', value='source')

## End(Not run)

[Package corpustools version 0.4.10 Index]