zenodo_get_tarball {cwbtools} | R Documentation |
Download corpus tarball from Zenodo
Description
Download corpus tarball from Zenodo. Downloading both freely available data and data with restricted access is supported.
Usage
zenodo_get_tarball(
url,
destfile = tempfile(fileext = ".tar.gz"),
checksum = TRUE,
verbose = TRUE,
progress = TRUE
)
gparlsample_url_restricted
Arguments
url |
Landing page at Zenodo for resource. Can also be the URL for restricted access (?token= appended with a long key), or a DOI referencing objects deposited with Zenodo. |
destfile |
A |
checksum |
A |
verbose |
A |
progress |
A |
Format
An object of class character
of length 1.
Details
A sample subset of the GermaParl corpus is deposited at Zenodo for testing purposes. There are identical open access and restricted versions of GermaParlSample to test different flavours of downloading a resource from Zenodo. The URL for restricted access includes an access token which is very lengthy. This URL is included as a dataset in the package to avoid excessive line in sample code. Note that URLs that give access to restricted data are usually not to be shared.
Value
The filename of the downloaded corpus tarball, designed to serve as
input for corpus_install
(as argument tarball
). If the
resource is not available, NULL
is returned.
The path of the downloaded resource, or NULL
if the operation has
not been successful.
Examples
# Temporary directory structure as a preparatory step
Sys.setenv(CORPUS_REGISTRY = "")
cwb_dirs <- create_cwb_directories(
prefix = tempdir(),
ask = FALSE,
verbose = FALSE
)
Sys.setenv(CORPUS_REGISTRY = cwb_dirs[["registry_dir"]])
# Download and install open access resource
gparl_url_pub <- "https://doi.org/10.5281/zenodo.3823245"
tarball_tmp <- zenodo_get_tarball(url = gparl_url_pub)
if (!is.null(tarball_tmp)) corpus_install(tarball = tarball_tmp)
# Download and install resource with restricted access
tarball_tmp <- zenodo_get_tarball(url = gparlsample_url_restricted)
if (!is.null(tarball_tmp)) corpus_install(tarball = tarball_tmp)