R: Get common nearby features given a query or query hits

feature_associations {corpustools}

R Documentation

Get common nearby features given a query or query hits

Description

Get common nearby features given a query or query hits

Usage

feature_associations(
  tc,
  feature,
  query = NULL,
  hits = NULL,
  query_feature = "token",
  window = 15,
  n = 25,
  min_freq = 1,
  sort_by = c("chi2", "ratio", "freq"),
  subset = NULL,
  subset_meta = NULL,
  include_self = F
)

Arguments

`tc`	a tCorpus
`feature`	The name of the feature column in $tokens
`query`	A character string that is a query. See search_features for documentation of the query language.
`hits`	Alternatively, instead of giving a query, the results of search_features can be used.
`query_feature`	If query is used, the column in $tokens on which the query is performed. By default uses 'token'
`window`	The size of the word window (i.e. the number of words next to the feature)
`n`	the top n of associated features
`min_freq`	Optionally, ignore features that occur less than min_freq times
`sort_by`	The value by which to sort the features
`subset`	A call (or character string of a call) as one would normally pass to subset.tCorpus. If given, the keyword has to occur within the subset. This is for instance usefull to only look in named entity POS tags when searching for people or organization. Note that the condition does not have to occur within the subset.
`subset_meta`	A call (or character string of a call) as one would normally pass to the subset_meta parameter of subset.tCorpus. If given, the keyword has to occur within the subset documents. This is for instance usefull to make queries date dependent. For example, in a longitudinal analysis of politicians, it is often required to take changing functions and/or party affiliations into account. This can be accomplished by using subset_meta = "date > xxx & date < xxx" (given that the appropriate date column exists in the meta data).
`include_self`	If True, include the feature itself in the output

Value

a data.frame

Examples


tc = create_tcorpus(sotu_texts, doc_column = 'id')
tc$preprocess()

## directly from query
topf = feature_associations(tc, 'feature', 'war')
head(topf, 20) ## frequent words close to "war"

## adjust window size
topf = feature_associations(tc, 'feature', 'war', window = 5)
head(topf, 20) ## frequent words very close (five tokens) to "war"

## you can also first perform search_features, to get hits for (complex) queries
hits = search_features(tc, '"war terror"~10')
topf = feature_associations(tc, 'feature', hits = hits)
head(topf, 20) ## frequent words close to the combination of "war" and "terror" within 10 words

[Package corpustools version 0.5.1 Index]