pkg_utils {cwbtools} | R Documentation |
Create and manage packages with corpus data.
Description
Putting CWB indexed corpora into R data packages is a convenient way to ship and share corpora, and to keep documentation and supplementary functionality with the data.
Usage
pkg_create_cwb_dirs(pkg = ".", verbose = TRUE)
pkg_add_corpus(
pkg = ".",
corpus,
registry = Sys.getenv("CORPUS_REGISTRY"),
verbose = TRUE
)
pkg_add_configure_scripts(pkg = ".")
pkg_add_description(
pkg = ".",
package = NULL,
version = "0.0.1",
date = Sys.Date(),
author,
maintainer = NULL,
description = "",
license = "",
verbose = TRUE
)
pkg_add_creativecommons_license(
pkg = ".",
license = "CC-BY-NC-SA",
file = system.file(package = "cwbtools", "txt", "licenses", "CC_BY-NC-SA_3.0.txt")
)
pkg_add_gitattributes_file(pkg = ".")
Arguments
pkg |
Path to directory of data package or package name. |
verbose |
A |
corpus |
Name of the CWB corpus to insert into the package. |
registry |
Registry directory. |
package |
The package name ( |
version |
The version number of the corpus (defaults to "0.0.1") |
date |
The date of creation, defaults to |
author |
The author of the package, either character vector or object of class |
maintainer |
Maintainer, R package style, either |
description |
description of the data package. |
license |
The license. |
file |
Path to file with fulltext of Creative Commons license. |
Details
pkg_creage_cwb_dirs
will create the standard directory
structure for storing registry files and indexed corpora within a package
(./inst/extdata/cwb/registry
and
./inst/extdata/cwb/indexed_corpora
, respectively).
pkg_add_corpus
will add the corpus described in registry directory to
the package defined by pkg
.
add_configure_script
will add standardized and tested
configure scripts configure
for Linux and macOS, and
configure.win
for Windows to the top level directory of the data
package, and file setpaths.R
to tools
subdirectory. The
configuration mechanism ensures that the data directory is specified
correctly in the registry files during the installation of the data
package.
pkg_add_description
will add a description file to the package.
pkg_add_creativecommons_license
will license information to
the DESCRIPTION file, and move file LICENSE to top level directory of the
package.
pkg_add_gitattributes_file
will add a file '.gitattributes'
to the package. The file defines types of files that will be tracked by Git
LFS, i.e. they will not be under conventional version control. This is
suitable for large binary files, which is the scenario applicable for
indexed corpus data.
References
Blätte, Andreas (2018). "Using Data Packages to Ship Annotated Corpora of Parliamentary Protocols: The GermaParl R Package", ParlaCLARIN 2018 Workshop Proceedings, available online here.
Examples
pkgdir <- fs::path_temp()
pkg_create_cwb_dirs(pkg = pkgdir)
pkg_add_description(
pkg = pkgdir,
package = "reuters",
author = "cwbtools",
description = "Reuters data package"
)
pkg_add_corpus(
pkg = pkgdir, corpus = "REUTERS",
registry = system.file(package = "RcppCWB", "extdata", "cwb", "registry")
)
pkg_add_gitattributes_file(pkg = pkgdir)
pkg_add_configure_scripts(pkg = pkgdir)
pkg_add_creativecommons_license(pkg = pkgdir)