| String {NLP} | R Documentation |
String objects
Description
Creation and manipulation of string objects.
Usage
String(x)
as.String(x)
is.String(x)
Arguments
x |
a character vector with the appropriate encoding information
for |
Details
String objects provide character strings encoded in UTF-8 with class
"String", which currently has a useful [ subscript
method: with indices i and j of length one, this gives a
string object with the substring starting at the position given by
i and ending at the position given by j; subscripting
with a single index which is an object inheriting from class
"Span" or a list of such objects returns a character
vector of substrings with the respective spans, or a list thereof.
Additional methods may be added in the future.
String() creates a string object from a given character vector,
taking the first element of the vector and converting it to UTF-8
encoding.
as.String() is a generic function to coerce to a string object.
The default method calls String() on the result of converting
to character and concatenating into a single string with the elements
separated by newlines.
is.String() tests whether an object inherits from class
"String".
Value
For String() and as.String(), a string object (of class
"String").
For is.String(), a logical.
Examples
## A simple text.
s <- String(" First sentence. Second sentence. ")
## ****5****0****5****0****5****0****5**
## Basic sentence and word token annotation for the text.
a <- c(Annotation(1 : 2,
rep.int("sentence", 2L),
c( 3L, 20L),
c(17L, 35L)),
Annotation(3 : 6,
rep.int("word", 4L),
c( 3L, 9L, 20L, 27L),
c( 7L, 16L, 25L, 34L)))
## All word tokens (by subscripting with an annotation object):
s[a[a$type == "word"]]
## Word tokens according to sentence (by subscripting with a list of
## annotation objects):
s[annotations_in_spans(a[a$type == "word"], a[a$type == "sentence"])]