cl_lexicon_size {RcppCWB} | R Documentation |
Get Lexicon Size.
Description
Get the total number of unique tokens/ids of a positional attribute. Note
that token ids are zero-based, i.e. when iterating through tokens, start at
0, the maximum will be cl_lexicon_size()
minus 1.
Usage
cl_lexicon_size(corpus, p_attribute, registry = Sys.getenv("CORPUS_REGISTRY"))
Arguments
corpus |
name of a CWB corpus (upper case) |
p_attribute |
name of positional attribute |
registry |
path to the registry directory, defaults to the value of the environment variable CORPUS_REGISTRY |
Examples
lexicon_size <- cl_lexicon_size(
"REUTERS",
p_attribute = "word",
registry = get_tmp_registry()
)
token_ids <- seq.int(from = 0, to = lexicon_size - 1)
cl_id2str(
"REUTERS",
p_attribute = "word",
id = token_ids,
registry = get_tmp_registry()
)
[Package RcppCWB version 0.6.4 Index]