udpipe_simplify {corpustools} | R Documentation |
Simplify tokenIndex created with the udpipe parser
Description
This is an off-the-shelf implementation of several rsyntax transformation for simplifying text.
Usage
udpipe_simplify(
tokens,
split_conj = T,
rm_punct = F,
new_sentences = F,
rm_mark = F
)
Arguments
tokens |
A tokenIndex, based on output from the ud parser. |
split_conj |
If TRUE, split conjunctions into separate sentences |
rm_punct |
If TRUE, remove punctuation afterwards |
new_sentences |
If TRUE, assign new sentence and token_id after splitting |
rm_mark |
If TRUE, remove children with a mark relation if this is used in the simplification. |
Value
a tokenIndex
Examples
if (interactive()) {
tc = tc_sotu_udpipe$copy()
tc2 = transform_rsyntax(tc, udpipe_simplify)
browse_texts(tc2)
rsyntax::plot_tree(tc_sotu_udpipe$tokens, token, lemma, POS, sentence_i=20)
rsyntax::plot_tree(tc2$tokens, token, lemma, POS, sentence_i=20)
}
[Package corpustools version 0.5.1 Index]