add_multitoken_label {corpustools}R Documentation

Choose and add multitoken strings based on multitoken categories

Description

Given a multitoken category (e.g., named entity ids), this function finds the most frequently occuring string in this category and adds it as a label for the category

Usage

add_multitoken_label(
  tc,
  colloc_id,
  feature = "token",
  new_feature = sprintf("%s_l", colloc_id),
  pref_subset = NULL
)

Arguments

tc

a tcorpus object

colloc_id

the data column containing the unique id for multitoken tokens

feature

the name of the feature column

new_feature

the name of the new feature column

pref_subset

Optionally, a subset call, to specify a subset that has priority for finding the most frequently occuring string


[Package corpustools version 0.4.10 Index]