cl_charset_name {RcppCWB} | R Documentation |
Get charset of a corpus.
Description
The encoding of a corpus is declared in the registry file (corpus property
"charset"). Once a corpus is loaded, this information is available without
parsing the registry file again and again. The cl_charset_name
offers
a quick access to this information.
Usage
cl_charset_name(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))
Arguments
corpus |
Name of a CWB corpus (upper case). |
registry |
Path to the registry directory, defaults to the value of the environment variable CORPUS_REGISTRY |
Examples
cl_charset_name(
corpus = "REUTERS",
registry = system.file(package = "RcppCWB", "extdata", "cwb", "registry")
)
[Package RcppCWB version 0.6.4 Index]