tokens_context {conText} | R Documentation |
Get the tokens of contexts sorrounding user defined patterns
Description
This function uses quanteda's kwic()
function to find the contexts
around user defined patterns (i.e. target words/phrases) and return a tokens object
with the tokenized contexts and corresponding document variables.
Usage
tokens_context(
x,
pattern,
window = 6L,
valuetype = c("glob", "regex", "fixed"),
case_insensitive = TRUE,
hard_cut = FALSE,
rm_keyword = TRUE,
verbose = TRUE
)
Arguments
x |
a (quanteda) |
pattern |
a character vector, list of character vectors, dictionary, or collocations object. See pattern for details. |
window |
the number of context words to be displayed around the keyword |
valuetype |
the type of pattern matching: |
case_insensitive |
logical; if |
hard_cut |
(logical) - if TRUE then a context must have |
rm_keyword |
(logical) if FALSE, keyword matching pattern is included in the tokenized contexts |
verbose |
(logical) if TRUE, report the total number of instances per pattern found |
Value
a (quanteda) tokens-class
. Each document in the output tokens object
inherits the document variables (docvars
) of the document from whence it came,
along with a column registering corresponding the pattern used.
This information can be retrieved using docvars()
.
Examples
library(quanteda)
# tokenize corpus
toks <- tokens(cr_sample_corpus)
# build a tokenized corpus of contexts sorrounding a target term
immig_toks <- tokens_context(x = toks, pattern = "immigr*", window = 6L)