get_count_vector {RcppCWB} | R Documentation |
Get Vector with Counts for Positional Attribute.
Description
The return value is an integer vector. The length of the vector is the number of unique tokens in the corpus / the number of unique ids. The order of the counts corresponds to the number of ids.
Usage
get_count_vector(corpus, p_attribute, registry = Sys.getenv("CORPUS_REGISTRY"))
Arguments
corpus |
a CWB corpus |
p_attribute |
a positional attribute |
registry |
registry directory |
Value
an integer vector
Examples
y <- get_count_vector(
corpus = "REUTERS", p_attribute = "word",
registry = get_tmp_registry()
)
df <- data.frame(token_id = 0:(length(y) - 1), count = y)
df[["token"]] <- cl_id2str(
"REUTERS", p_attribute = "word",
id = df[["token_id"]], registry = get_tmp_registry()
)
df <- df[,c("token", "token_id", "count")] # reorder columns
df <- df[order(df[["count"]], decreasing = TRUE),]
head(df)
[Package RcppCWB version 0.6.4 Index]