viewers {NLP}R Documentation

Text Document Viewers

Description

Provide suitable “views” of the text contained in text documents.

Usage

words(x, ...)
sents(x, ...)
paras(x, ...)
tagged_words(x, ...)
tagged_sents(x, ...)
tagged_paras(x, ...)
chunked_sents(x, ...)
parsed_sents(x, ...)
parsed_paras(x, ...)

Arguments

x

a text document object.

...

further arguments to be passed to or from methods.

Details

Methods for extracting POS tagged word tokens (i.e., for generics tagged_words(), tagged_sents() and tagged_paras()) can optionally provide a mechanism for mapping the POS tags via a map argument. This can give a function, a named character vector (with names and elements the tags to map from and to, respectively), or a named list of such named character vectors, with names corresponding to POS tagsets (see Universal_POS_tags_map for an example). If a list, the map used will be the element with name matching the POS tagset used (this information is typically determined from the text document metadata; see the the help pages for text document extension classes implementing this mechanism for details).

In addition to methods for the text document classes provided by package NLP itself, (see TextDocument), package NLP also provides word tokens and POS tagged word tokens for the results of udpipe_annotate() from package udpipe, spacy_parse() from package spacyr, and cnlp_annotate() from package cleanNLP.

Value

For words(), a character vector with the word tokens in the document.

For sents(), a list of character vectors with the word tokens in the sentences.

For paras(), a list of lists of character vectors with the word tokens in the sentences, grouped according to the paragraphs.

For tagged_words(), a character vector with the POS tagged word tokens in the document (i.e., the word tokens and their POS tags, separated by ‘⁠/⁠’).

For tagged_sents(), a list of character vectors with the POS tagged word tokens in the sentences.

For tagged_paras(), a list of lists of character vectors with the POS tagged word tokens in the sentences, grouped according to the paragraphs.

For chunked_sents(), a list of (flat) Tree objects giving the chunk trees for the sentences in the document.

For parsed_sents(), a list of Tree objects giving the parse trees for the sentences in the document.

For parsed_paras(), a list of lists of Tree objects giving the parse trees for the sentences in the document, grouped according to the paragraphs in the document.

See Also

TextDocument for basic information on the text document infrastructure employed by package NLP.


[Package NLP version 0.2-1 Index]