kwic {quanteda} | R Documentation |
Locate keywords-in-context
Description
For a text or a collection of texts (in a quanteda corpus object), return a list of a keyword supplied by the user in its immediate context, identifying the source text and the word index number within the source text. (Not the line number, since the text may or may not be segmented using end-of-line delimiters.)
Usage
kwic(
x,
pattern,
window = 5,
valuetype = c("glob", "regex", "fixed"),
separator = " ",
case_insensitive = TRUE,
index = NULL,
...
)
is.kwic(x)
## S3 method for class 'kwic'
as.data.frame(x, ...)
Arguments
x |
|
pattern |
a character vector, list of character vectors, dictionary, or collocations object. See pattern for details. |
window |
the number of context words to be displayed around the keyword |
valuetype |
the type of pattern matching: |
separator |
a character to separate words in the output |
case_insensitive |
logical; if |
index |
an index object to specify keywords |
... |
unused |
Value
A kwic
classed data.frame, with the document name
(docname
) and the token index positions (from
and to
,
which will be the same for single-word patterns, or a sequence equal in
length to the number of elements for multi-word phrases).
Note
pattern
will be a keyword pattern or phrase, possibly multiple
patterns, that may include punctuation. If a pattern contains whitespace,
it is best to wrap it in phrase()
to make this explicit. However if
pattern
is a collocations
(see quanteda.textstats or
dictionary object, then the collocations or multi-word dictionary keys
will automatically be considered phrases where each whitespace-separated
element matches a token in sequence.
See Also
Examples
# single token matching
toks <- tokens(data_corpus_inaugural[1:8])
kwic(toks, pattern = "secure*", valuetype = "glob", window = 3)
kwic(toks, pattern = "secur", valuetype = "regex", window = 3)
kwic(toks, pattern = "security", valuetype = "fixed", window = 3)
# phrase matching
kwic(toks, pattern = phrase("secur* against"), window = 2)
kwic(toks, pattern = phrase("war against"), valuetype = "regex", window = 2)
# use index
idx <- index(toks, phrase("secur* against"))
kwic(toks, index = idx, window = 2)
kw <- kwic(tokens(data_corpus_inaugural[1:20]), "provident*")
is.kwic(kw)
is.kwic("Not a kwic")
is.kwic(kw[, c("pre", "post")])
toks <- tokens(data_corpus_inaugural[1:8])
kw <- kwic(toks, pattern = "secure*", valuetype = "glob", window = 3)
as.data.frame(kw)