concat {quanteda} | R Documentation |
Return the concatenator character from an object
Description
Get the concatenator character from a tokens object.
Usage
concat(x)
concatenator(x)
Arguments
x |
a tokens object |
Details
The concatenator character is a special delimiter used to link
separate tokens in multi-token phrases. It is embedded in the meta-data of
tokens objects and used in downstream operations, such as tokens_compound()
or tokens_lookup()
. It can be extracted using concat()
and set using
tokens(x, concatenator = ...)
when x
is a tokens object.
The default _
is recommended since it will not be removed during normal
cleaning and tokenization (while nearly all other punctuation characters, at
least those in the Unicode punctuation class [P]
will be removed).
Value
a character of length 1
Examples
toks <- tokens(data_corpus_inaugural[1:5])
concat(toks)
[Package quanteda version 4.0.2 Index]