top_features {corpustools} R Documentation

## Show top features

### Description

Show top features

### Usage

top_features(
tc,
feature,
n = 10,
group_by = NULL,
group_by_meta = NULL,
rank_by = c("freq", "chi2"),
dropNA = T,
return_long = F
)


### Arguments

 tc a tCorpus feature The name of the feature n Return the top n features group_by A column in the token data to group the top features by. For example, if token data contains part-of-speech tags (pos), then grouping by pos will show the top n feature per part-of-speech tag. group_by_meta A column in the meta data to group the top features by. rank_by The method for ranking the terms. Currently supports frequency (default) and the 'Chi2' value for the relative frequency of a term in a topic compared to the overall corpus. If return_long is used, the Chi2 score is also returned, but note that there are negative Chi2 scores. This is used to indicate that the relative frequency of a feature in a group was lower than the relative frequency in the corpus (i.e. under-represented). dropNA if TRUE, drop NA features return_long if TRUE, results will be returned in a long format that contains more information.

a data.frame

### Examples

tc = tokens_to_tcorpus(corenlp_tokens, token_id_col = 'id')

top_features(tc, 'lemma')
top_features(tc, 'lemma', group_by = 'NER', group_by_meta='doc_id')


[Package corpustools version 0.4.10 Index]