annotate {rsyntax} | R Documentation |
Annotate a tokenlist based on rsyntax queries
Description
This function has been renamed to annotate_tqueries.
Usage
annotate(
tokens,
column,
...,
block = NULL,
fill = TRUE,
overwrite = FALSE,
block_fill = FALSE,
copy = TRUE,
verbose = FALSE
)
Arguments
tokens |
A tokenIndex data.table, or any data.frame coercible with as_tokenindex. |
column |
The name of the column in which the annotations are added. The unique ids are added as column_id |
... |
One or multiple tqueries, or a list of queries, as created with tquery. Queries can be given a named by using a named argument, which will be used in the annotation_id to keep track of which query was used. |
block |
Optionally, specify ids (doc_id - sentence - token_id triples) that are blocked from querying and filling (ignoring the id and recursive searches through the id). |
fill |
Logical. If TRUE (default) also assign the fill nodes (as specified in the tquery). Otherwise these are ignored |
overwrite |
If TRUE, existing column will be overwritten. Otherwise (default), the exsting annotations in the column will be blocked, and new annotations will be added. This is identical to using multiple queries. |
block_fill |
If TRUE (and overwrite is FALSE), the existing fill nodes will also be blocked. In other words, the new annotations will only be added if the |
copy |
If TRUE (default), the data.table is copied. Otherwise, it is changed by reference. Changing by reference is faster and more memory efficient, but is not predictable R style, so is optional. |
verbose |
If TRUE, report progress (only usefull if multiple queries are given) |
Details
Apply queries to extract syntax patterns, and add the results as two columns to a tokenlist. One column contains the ids for each hit. The other column contains the annotations. Only nodes that are given a name in the tquery (using the 'label' parameter) will be added as annotation.
Note that while queries only find 1 node for each labeld component of a pattern (e.g., quote queries have 1 node for "source" and 1 node for "quote"), all children of these nodes can be annotated by settting fill to TRUE. If a child has multiple ancestors, only the most direct ancestors are used (see documentation for the fill argument).
Value
The tokenIndex with the annotation columns
Examples
## spacy tokens for: Mary loves John, and Mary was loved by John
tokens = tokens_spacy[tokens_spacy$doc_id == 'text3',]
## two simple example tqueries
passive = tquery(pos = "VERB*", label = "predicate",
children(relation = c("agent"), label = "subject"))
active = tquery(pos = "VERB*", label = "predicate",
children(relation = c("nsubj", "nsubjpass"), label = "subject"))
tokens = annotate_tqueries(tokens, "clause", pas=passive, act=active)
tokens
if (interactive()) plot_tree(tokens, annotation='clause')