top_features {corpustools}R Documentation

Show top features

Description

Show top features

Usage

top_features(
  tc,
  feature,
  n = 10,
  group_by = NULL,
  group_by_meta = NULL,
  rank_by = c("freq", "chi2"),
  dropNA = T,
  return_long = F
)

Arguments

tc

a tCorpus

feature

The name of the feature

n

Return the top n features

group_by

A column in the token data to group the top features by. For example, if token data contains part-of-speech tags (pos), then grouping by pos will show the top n feature per part-of-speech tag.

group_by_meta

A column in the meta data to group the top features by.

rank_by

The method for ranking the terms. Currently supports frequency (default) and the 'Chi2' value for the relative frequency of a term in a topic compared to the overall corpus. If return_long is used, the Chi2 score is also returned, but note that there are negative Chi2 scores. This is used to indicate that the relative frequency of a feature in a group was lower than the relative frequency in the corpus (i.e. under-represented).

dropNA

if TRUE, drop NA features

return_long

if TRUE, results will be returned in a long format that contains more information.

Value

a data.frame

Examples

tc = tokens_to_tcorpus(corenlp_tokens, token_id_col = 'id')

top_features(tc, 'lemma')
top_features(tc, 'lemma', group_by = 'NER', group_by_meta='doc_id')

[Package corpustools version 0.5.1 Index]