get_kwic {corpustools}R Documentation

Get keyword-in-context (KWIC) strings

Description

Create a data.frame with keyword-in-context strings for given indices (i), search results (hits) or search strings (keyword).

Usage

get_kwic(
  tc,
  hits = NULL,
  i = NULL,
  query = NULL,
  code = "",
  ntokens = 10,
  n = NA,
  nsample = NA,
  output_feature = "token",
  query_feature = "token",
  context_level = c("document", "sentence"),
  kw_tag = c("<", ">"),
  ...
)

Arguments

tc

a tCorpus

hits

results of feature search. see search_features.

i

instead of the hits argument, you can give the indices of features directly.

query

instead of using the hits or i arguments, a search string can be given directly. Note that this simply a convenient shorthand for first creating a hits object with search_features. If a query is given, then the ... argument is used to pass other arguments to search_features.

code

if 'i' or 'query' is used, the code argument can be used to add a code label. Should be a vector of the same length that gives the code for each i or query, or a vector of length 1 for a single label.

ntokens

an integers specifying the size of the context, i.e. the number of tokens left and right of the keyword.

n

a number, specifying the total number of hits

nsample

like n, but with a random sample of hits. If multiple codes are used, the sample is drawn for each code individually.

output_feature

the feature column that is used to make the KWIC.

query_feature

If query is used, the feature column that is used to perform the query

context_level

Select the maxium context (document or sentence).

kw_tag

a character vector of length 2, that gives the symbols before (first value) and after (second value) the keyword in the KWIC string. Can for instance be used to prepare KWIC with format tags for highlighting.

...

See search_features for the query parameters

Details

This is mainly for viewing results in the R console. If you want to create a subset corpus based on the context of query results, you can use subset_query with the window argument. Also, the browse_hits function is a good alternative for viewing query hits in full text.

Examples

tc = tokens_to_tcorpus(corenlp_tokens, sentence_col = 'sentence', token_id_col = 'id')

## look directly for a term (or complex query)
get_kwic(tc, query = 'love*')

## or, first perform a feature search, and then get the KWIC for the results
hits = search_features(tc, '(john OR mark) AND mary AND love*', context_level = 'sentence')
get_kwic(tc, hits=hits, context_level = 'sentence')

[Package corpustools version 0.4.10 Index]