R: Get Lexicon Size.

cl_lexicon_size {RcppCWB}

R Documentation

Get Lexicon Size.

Description

Get the total number of unique tokens/ids of a positional attribute. Note that token ids are zero-based, i.e. when iterating through tokens, start at 0, the maximum will be cl_lexicon_size() minus 1.

Usage

cl_lexicon_size(corpus, p_attribute, registry = Sys.getenv("CORPUS_REGISTRY"))

Arguments

`corpus`	name of a CWB corpus (upper case)
`p_attribute`	name of positional attribute
`registry`	path to the registry directory, defaults to the value of the environment variable CORPUS_REGISTRY

Examples

lexicon_size <- cl_lexicon_size(
  "REUTERS",
  p_attribute = "word",
  registry = get_tmp_registry()
)

token_ids <- seq.int(from = 0, to = lexicon_size - 1)
cl_id2str(
  "REUTERS",
  p_attribute = "word",
  id = token_ids,
  registry = get_tmp_registry()
)

[Package RcppCWB version 0.6.4 Index]