get_context {conText} | R Documentation |
Get context words (words within a symmetric window around the target word/phrase) sorrounding a user defined target.
Description
A wrapper function for quanteda's kwic()
function that subsets documents to where
target is present before tokenizing to speed up processing, and concatenates
kwic's pre/post variables into a context
column.
Usage
get_context(
x,
target,
window = 6L,
valuetype = "fixed",
case_insensitive = TRUE,
hard_cut = FALSE,
what = "word",
verbose = TRUE
)
Arguments
x |
(character) vector - this is the set of documents (corpus) of interest. |
target |
(character) vector - these are the target words whose contexts we want to evaluate This vector may include a single token, a phrase or multiple tokens and/or phrases. |
window |
(numeric) - defines the size of a context (words around the target). |
valuetype |
the type of pattern matching: |
case_insensitive |
logical; if |
hard_cut |
(logical) - if TRUE then a context must have |
what |
(character) defines which quanteda tokenizer to use. You will rarely want to change this.
For chinese text you may want to set |
verbose |
(logical) - if TRUE, report the total number of target instances found. |
Value
a data.frame
with the following columns:
docname
(character) document name to which instances belong to.
target
(character) targets.
context
(numeric) pre/post variables in
kwic()
output concatenated.
Note
target
in the return data.frame is equivalent to kwic()
's keyword
output variable,
so it may not match the user-defined target exactly if valuetype
is not fixed.
Examples
# get context words sorrounding the term immigration
context_immigration <- get_context(x = cr_sample_corpus, target = 'immigration',
window = 6, valuetype = "fixed", case_insensitive = FALSE,
hard_cut = FALSE, verbose = FALSE)