prepInputs {reproducible} | R Documentation |
Download and optionally post-process files
Description
Usage
prepInputs(
targetFile = NULL,
url = NULL,
archive = NULL,
alsoExtract = NULL,
destinationPath = getOption("reproducible.destinationPath", "."),
fun = NULL,
quick = getOption("reproducible.quick"),
overwrite = getOption("reproducible.overwrite", FALSE),
purge = FALSE,
useCache = getOption("reproducible.useCache", 2),
.tempPath,
verbose = getOption("reproducible.verbose", 1),
...
)
Arguments
targetFile |
Character string giving the filename (without relative or
absolute path) to the eventual file
(raster, shapefile, csv, etc.) after downloading and extracting from a zip
or tar archive. This is the file before it is passed to
|
url |
Optional character string indicating the URL to download from.
If not specified, then no download will be attempted. If not entry
exists in the |
archive |
Optional character string giving the path of an archive
containing |
alsoExtract |
Optional character string naming files other than
|
destinationPath |
Character string of a directory in which to download
and save the file that comes from |
fun |
Optional. If specified, this will attempt to load whatever
file was downloaded during |
quick |
Logical. This is passed internally to |
overwrite |
Logical. Should downloading and all the other actions occur even if they pass the checksums or the files are all there. |
purge |
Logical or Integer. |
useCache |
Passed to |
.tempPath |
Optional temporary path for internal file intermediate steps. Will be cleared on.exit from this function. |
verbose |
Numeric, -1 silent (where possible), 0 being very quiet,
1 showing more messaging, 2 being more messaging, etc.
Default is 1. Above 3 will output much more information about the internals of
Caching, which may help diagnose Caching challenges. Can set globally with an
option, e.g., |
... |
Additional arguments passed to
|
Details
This function can be used to prepare R objects from remote or local data sources.
The object of this function is to provide a reproducible version of
a series of commonly used steps for getting, loading, and processing data.
This function has two stages: Getting data (download, extracting from archives,
loading into R) and post-processing (for Spatial*
and Raster*
objects, this is crop, reproject, mask/intersect).
To trigger the first stage, provide url
or archive
.
To trigger the second stage, provide studyArea
or rasterToMatch
.
See examples.
Value
This is an omnibus function that will return an R object that will have resulted from
the running of preProcess()
and postProcess()
or postProcessTo()
. Thus,
if it is a GIS object, it may have been cropped, reprojected, "fixed", masked, and
written to disk.
Stage 1 - Getting data
See preProcess()
for combinations of arguments.
Download from the web via either
googledrive::drive_download()
,utils::download.file()
;Load into R using
terra::rast
,sf::st_read
, or any other function passed in withfun
;Checksumming of all files during this process. This is put into a ‘CHECKSUMS.txt’ file in the
destinationPath
, appending if it is already there, overwriting the entries for same files if entries already exist.
Stage 2 - Post processing
This will be triggered if either rasterToMatch
or studyArea
is supplied.
Fix errors. Currently only errors fixed are for
SpatialPolygons
usingbuffer(..., width = 0)
;Crop using
cropTo()
;Project using
projectTo()
;Mask using
maskTo()
;Determine file name
determineFilename()
viafilename2
;Optionally, write that file name to disk via
writeTo()
.
NOTE: checksumming does not occur during the post-processing stage, as
there are no file downloads. To achieve fast results, wrap
prepInputs
with Cache
.
NOTE: sf
objects are still very experimental.
postProcessing of Spat*
, sf
, Raster*
and Spatial*
objects:
The following has been DEPRECATED because there are a sufficient number of
ambiguities that this has been changed in favour of from
and the *to
family.
See postProcessTo()
.
DEPRECATED: If rasterToMatch
or studyArea
are used, then this will
trigger several subsequent functions, specifically the sequence,
Crop, reproject, mask, which appears to be a common sequence while
preparing spatial data from diverse sources.
See postProcess()
documentation section on
Backwards compatibility with rasterToMatch
and/or studyArea
arguments
to understand various combinations of rasterToMatch
and/or studyArea
.
fun
fun
offers the ability to pass any custom function with which to load
the file obtained by preProcess
into the session. There are two cases that are
dealt with: when the preProcess
downloads a file (including via dlFun
),
fun
must deal with a file; and, when preProcess
creates an R object
(e.g., raster::getData returns an object), fun
must deal with an object.
fun
can be supplied in three ways: a function, a character string
(i.e., a function name as a string), or an expression.
If a character string or function, is should have the package name e.g.,
"terra::rast"
or as an actual function, e.g., base::readRDS
.
In these cases, it will evaluate this function call while passing targetFile
as the first argument. These will only work in the simplest of cases.
When more precision is required, the full call can be written and where the
filename can be referred to as targetFile
if the function
is loading a file. If preProcess
returns an object, fun
should be set to
fun = NA
.
If there is a custom function call, is not in a package, prepInputs
may not find it. In such
cases, simply pass the function as a named argument (with same name as function) to prepInputs
.
See examples.
NOTE: passing fun = NA
will skip loading object into R. Note this will essentially
replicate the functionality of simply calling preProcess
directly.
purge
In options for control of purging the CHECKSUMS.txt
file are:
0
keep file
1
delete file in
destinationPath
, all records of downloads need to be rebuilt2
delete entry with same
targetFile
4
delete entry with same
alsoExtract
3
delete entry with same
archive
5
delete entry with same
targetFile
&alsoExtract
6
delete entry with same
targetFile
,alsoExtract
&archive
7
delete entry that same
targetFile
,alsoExtract
&archive
&url
will only remove entries in the CHECKSUMS.txt
that are associated with
targetFile
, alsoExtract
or archive
When prepInputs
is called,
it will write or append to a (if already exists) CHECKSUMS.txt
file.
If the CHECKSUMS.txt
is not correct, use this argument to remove it.
Note
This function is still experimental: use with caution.
Author(s)
Eliot McIntire, Jean Marchal, and Tati Micheletti
See Also
postProcessTo()
, downloadFile()
, extractFromArchive()
,
postProcess()
.
Examples
if (requireNamespace("terra", quietly = TRUE) &&
requireNamespace("sf", quietly = TRUE)) {
library(reproducible)
# Make a dummy study area map -- user would supply this normally
coords <- structure(c(-122.9, -116.1, -99.2, -106, -122.9, 59.9, 65.7, 63.6, 54.8, 59.9),
.Dim = c(5L, 2L)
)
studyArea <- terra::vect(coords, "polygons")
terra::crs(studyArea) <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
# Make dummy "large" map that must be cropped to the study area
outerSA <- terra::buffer(studyArea, 50000)
terra::crs(outerSA) <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
tf <- normPath(file.path(tempdir2("prepInputsEx"), "prepInputs2.shp"))
terra::writeVector(outerSA, tf, overwrite = TRUE)
# run prepInputs -- load file, postProcess it to the studyArea
studyArea2 <- prepInputs(
targetFile = tf, to = studyArea,
fun = "terra::vect",
destinationPath = tempdir2()
) |>
suppressWarnings() # not relevant warning here
# clean up
unlink("CHECKSUMS.txt")
##########################################
# Remote file using `url`
##########################################
if (internetExists()) {
data.table::setDTthreads(2)
origDir <- getwd()
# download a zip file from internet, unzip all files, load as shapefile, Cache the call
# First time: don't know all files - prepInputs will guess, if download file is an archive,
# then extract all files, then if there is a .shp, it will load with sf::st_read
dPath <- file.path(tempdir(), "ecozones")
shpUrl <- "http://sis.agr.gc.ca/cansis/nsdb/ecostrat/zone/ecozone_shp.zip"
# Wrapped in a try because this particular url can be flaky
shpEcozone <- try(prepInputs(
destinationPath = dPath,
url = shpUrl
))
if (!is(shpEcozone, "try-error")) {
# Robust to partial file deletions:
unlink(dir(dPath, full.names = TRUE)[1:3])
shpEcozone <- prepInputs(
destinationPath = dPath,
url = shpUrl
)
unlink(dPath, recursive = TRUE)
# Once this is done, can be more precise in operational code:
# specify targetFile, alsoExtract, and fun, wrap with Cache
ecozoneFilename <- file.path(dPath, "ecozones.shp")
ecozoneFiles <- c(
"ecozones.dbf", "ecozones.prj",
"ecozones.sbn", "ecozones.sbx", "ecozones.shp", "ecozones.shx"
)
shpEcozone <- prepInputs(
targetFile = ecozoneFilename,
url = shpUrl,
fun = "terra::vect",
alsoExtract = ecozoneFiles,
destinationPath = dPath
)
unlink(dPath, recursive = TRUE)
# Add a study area to Crop and Mask to
# Create a "study area"
coords <- structure(c(-122.98, -116.1, -99.2, -106, -122.98, 59.9, 65.73, 63.58, 54.79, 59.9),
.Dim = c(5L, 2L)
)
studyArea <- terra::vect(coords, "polygons")
terra::crs(studyArea) <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
# specify targetFile, alsoExtract, and fun, wrap with Cache
ecozoneFilename <- file.path(dPath, "ecozones.shp")
# Note, you don't need to "alsoExtract" the archive... if the archive is not there, but the
# targetFile is there, it will not redownload the archive.
ecozoneFiles <- c(
"ecozones.dbf", "ecozones.prj",
"ecozones.sbn", "ecozones.sbx", "ecozones.shp", "ecozones.shx"
)
shpEcozoneSm <- Cache(prepInputs,
url = shpUrl,
targetFile = reproducible::asPath(ecozoneFilename),
alsoExtract = reproducible::asPath(ecozoneFiles),
studyArea = studyArea,
fun = "terra::vect",
destinationPath = dPath,
filename2 = "EcozoneFile.shp"
) # passed to determineFilename
terra::plot(shpEcozone[, 1])
terra::plot(shpEcozoneSm[, 1], add = TRUE, col = "red")
unlink(dPath)
}
}
}
## Using quoted dlFun and fun -- this is not intended to be run but used as a template
## prepInputs(..., fun = customFun(x = targetFile), customFun = customFun)
## # or more complex
## test5 <- prepInputs(
## targetFile = targetFileLuxRDS,
## dlFun =
## getDataFn(name = "GADM", country = "LUX", level = 0) # preProcess keeps file from this!
## ,
## fun = {
## out <- readRDS(targetFile)
## sf::st_as_sf(out)}
## )