index {quanteda} | R Documentation |
Locate a pattern in a tokens object
Description
Locates a pattern within a tokens object, returning the index positions of the beginning and ending tokens in the pattern.
Usage
index(
x,
pattern,
valuetype = c("glob", "regex", "fixed"),
case_insensitive = TRUE
)
is.index(x)
Arguments
x |
an input tokens object |
pattern |
a character vector, list of character vectors, dictionary, or collocations object. See pattern for details. |
valuetype |
the type of pattern matching: |
case_insensitive |
logical; if |
Value
a data.frame consisting of one row per pattern match, with columns
for the document name, index positions from
and to
, and the pattern
matched.
is.index
returns TRUE
if the object was created by
index()
; FALSE
otherwise.
Examples
toks <- tokens(data_corpus_inaugural[1:8])
index(toks, pattern = "secure*")
index(toks, pattern = c("secure*", phrase("united states"))) |> head()
[Package quanteda version 4.0.2 Index]