s_attribute_decode {RcppCWB} | R Documentation |
Decode Structural Attribute.
Description
Get data.frame
with left and right corpus positions (cpos) for
structural attributes and values.
Usage
s_attribute_decode(
corpus,
data_dir,
s_attribute,
encoding = NULL,
registry = Sys.getenv("CORPUS_REGISTRY"),
method = c("R", "Rcpp")
)
Arguments
corpus |
A CWB corpus (ID in upper case). |
data_dir |
The data directory where the binary files of the corpus are stored. |
s_attribute |
A structural attribute (length 1 |
encoding |
Encoding of the values ("latin-1" or "utf-8") |
registry |
The CWB registry directory. |
method |
A length-one |
Details
Two approaches are implemented: A pure R solution will decode the files
directly in the directory specified by data_dir
. An implementation
using Rcpp will use the registry file for corpus
to find the data
directory.
Value
A data.frame
with three columns, if the s-attribute has
values, or two columns, if not. Column cpos_left
are the start
corpus positions of a structural annotation, cpos_right
the end
corpus positions. Column value
is the value of the annotation.
Examples
# pure R implementation (Rcpp implementation fails on Windows in vanilla mode)
b <- s_attribute_decode(
corpus = "REUTERS",
data_dir = system.file(package = "RcppCWB", "extdata", "cwb", "indexed_corpora", "reuters"),
registry = get_tmp_registry(),
s_attribute = "places", method = "R"
)
# Using Rcpp wrappers for CWB C code
b <- s_attribute_decode(
corpus = "REUTERS",
data_dir = system.file(package = "RcppCWB", "extdata", "cwb", "indexed_corpora", "reuters"),
s_attribute = "places",
method = "Rcpp",
registry = get_tmp_registry()
)