reproducibleOptions {reproducible} | R Documentation |
reproducible
options
Description
These provide top-level, powerful settings for a comprehensive
reproducible workflow. To see defaults, run reproducibleOptions()
.
See Details below.
Usage
reproducibleOptions()
Details
Below are options that can be set with options("reproducible.xxx" = newValue)
,
where xxx
is one of the values below, and newValue
is a new value to
give the option. Sometimes these options can be placed in the user's .Rprofile
file so they persist between sessions.
The following options are likely of interest to most users:
ask
-
Default:
TRUE
. Used inclearCache()
andkeepCache()
. cachePath
-
Default:
.reproducibleTempCacheDir
. Used inCache()
and many others. The default path for repositories if not passed as an argument. cacheSaveFormat
-
Default:
"rds"
. What save format to use; currently,"qs"
or"rds"
. cacheSpeed
-
Default
"slow"
. One of"slow"
or"fast"
(1 or 2)."slow"
usesdigest::digest
internally, which is transferable across operating systems, but much slower thandigest::digest(algo = "spooky)
. So, if all caching is happening on a single machine,"fast"
would be a good setting. conn
-
Default:
NULL
. Sets a specific connection to a database, e.g.,dbConnect(drv = RSQLite::SQLite())
ordbConnect(drv = RPostgres::Postgres()
. For remote database servers, setting one connection may be far faster than usingdrv
which must make a new connection every time. destinationPath
-
Default:
NULL
. Used inprepInputs()
andpreProcess()
. Can be set globally here. drv
-
Default:
RSQLite::SQLite()
. Sets the default driver for the backend database system. Only tested withRSQLite::SQLite()
andRPostgres::Postgres()
. futurePlan
-
Default:
FALSE
. On Linux OSes,Cache
andcloudCache
have some functionality that uses thefuture
package. Default is to not use these, as they are experimental. They may, however, be very effective in speeding up some things, specifically, uploading cached elements viagoogledrive
incloudCache
. gdalwarp
-
Default:
FALSE
. Experimental. DuringpostProcessTo
the standard approach is to useterra
functions directly, with several strategic uses ofsf
. However, in the special case whenfrom
is aSpatRaster
orRaster
,maskTo
is aSpatVector
orSFC_POLYGON
andprojectTo
is aSpatRaster
orRaster
, setting this option toTRUE
will usesf::gdal_utils("warp")
. In many test cases, this is much faster than theterra
sequence. The resultingSpatRaster
is not identical, but it is very similar. gdalwarpThreads
-
Default:
2
. This will set-wo NUM_THREADS=
to this number. Default is now2
, meaninggdalwarp
will use 2 threads withgdalProject
. To turn off threading, set to0
,1
orNA
. inputPaths
-
Default:
NULL
. Used inprepInputs()
andpreProcess()
. If set to a path, this will cause these functions to save their downloaded and preprocessed file to this location, with a hardlink (viafile.link
) to the file created in thedestinationPath
. This can be used so that individual projects that use common data sets can maintain modularity (by placing downloaded objects in theirdestinationPath
, but also minimize re-downloading the same (perhaps large) file over and over for each project. Because the files are hardlinks, there is no extra space taken up by the apparently duplicated files. inputPathsRecursive
-
Default:
FALSE
. Used inprepInputs()
andpreProcess()
. Should thereproducible.inputPaths
be searched recursively for existence of a file? memoisePersist
-
Default:
FALSE
. Used inCache()
. Should the memoised copy of the Cache objects persist even ifreproducible
reloads e.g., viadevtools::load_all
? This is mostly useful for developers ofreproducible
. IfTRUE
, a object namedpaste0(".reproducibleMemoise_", cachePath)
will be placed in the.GlobalEnv
, i.e., one for eachcachePath
. nThreads
-
Default:
1
. The number of threads to use for reading/writing cache files. objSize
-
Default:
TRUE
. Logical. IfTRUE
, then object sizes will be included in the cache database. Simplying calculating object size of large objects can be time consuming, so setting this toFALSE
will make caching up to 10% faster, depending on the objects. overwrite
-
Default:
FALSE
. Used inprepInputs()
,preProcess()
,downloadFile()
, andpostProcess()
. quick
-
Default:
FALSE
. Used inCache()
. This will causeCache
to usefile.size(file)
instead of thedigest::digest(file)
. Less robust to changes, but faster. NOTE: this will only affect objects on disk. rasterRead
-
Used during
prepInputs
when reading.tif
,.grd
, and.asc
files. Default:terra::rast
. Can beraster::raster
for backwards compatibility. Can be set using environment variableR_REPRODUCIBLE_RASTER_READ
. shapefileRead
-
Default
NULL
. Used duringprepInputs
when reading a.shp
file. IfNULL
, it will usesf::st_read
ifsf
package is available; otherwise, it will useraster::shapefile
showSimilar
-
Default
FALSE
. Passed toCache
. timeout
-
Default
1200
. Used inpreProcess
when downloading occurs. If a user hasR.utils
package installed,R.utils::withTimeout( , timeout = getOption("reproducible.timeout"))
will be wrapped around the download so that it will timeout (and error) after this many seconds. useCache
-
Default:
TRUE
. Used inCache()
. IfFALSE
, then the entireCache
machinery is skipped and the functions are run as if there was no Cache occurring. Can also take 2 other values:'overwrite'
and'devMode'
.'overwrite'
will cause no recovery of objects from the cache repository, only new ones will be created. If the hash is identical to a previous one, then this will overwrite the previous one.'devMode'
will function as normallyCache
except it will use theuserTags
to determine if a previous function has been run. If theuserTags
are identical, but the digest value is different, the old value will be deleted from the cache repository and this new value will be added. This addresses a common situation during the development stage: functions are changing frequently, so any entry in the cache repository will be stale following changes to functions, i.e., they will likely never be relevant again. This will therefore keep the cache repository clean of stale objects. If there is ambiguity in theuserTags
, i.e., they do not uniquely identify a single entry in thecachePath
, then this option will default back to the non-dev-mode behaviour to avoid deleting objects. This, therefore, is most useful if the user is using unique values foruserTags
. useCloud
-
Default
FALSE
. Passed toCache
. useDBI
-
Default:
TRUE
if DBI is available. Default value can be overridden by setting environment variableR_REPRODUCIBLE_USE_DBI
. As of version 0.3, the backend is now DBI instead of archivist. useGdown
-
Default:
FALSE
. If a user provides a Google Drive url topreProcess
/prepInputs
,reproducible
will use thegoogledrive
package. This works reliably in most cases. However, for large files on unstable internet connections, it will stall and stop the download with no error. If a user is finding this behaviour, they can install thegdown
package, making sure it is available on the PATH. This call togdown
will only work for files that do not need authentication. If authentication is needed,dlGoogle
will fall back togoogledrive::drive_download
, even if this option isTRUE
, with a message. . useMemoise
-
Default:
FALSE
. Used inCache()
. IfTRUE
, recovery of cached elements from thecachePath
will usememoise::memoise
. This means that the 2nd time running a function will be much faster than the first in a session (which either will create a new cache entry to disk or read a cached entry from disk). NOTE: memoised values are removed when the R session is restarted. This option will use more RAM and so may need to be turned off if RAM is limiting.clearCache
of any sort will cause all memoising to be 'forgotten' (memoise::forget
). useNewDigestAlgorithm
-
Default:
1
. Option 1 is the version that has existed for sometime. There is now an option2
which is substantially faster. It will, however, create Caches that are not compatible with previous ones. Options1
and2
are not compatible with the earlier0
.1
and2
will makeCache
less sensitive to minor but irrelevant changes (like changing the order of arguments) and will work successfully across operating systems (especially relevant for the newcloudCache
function. useTerra
-
Default:
FALSE
. The GIS operations in postProcess, by default use primarily the Raster package. The newer terra package does similar operations, but usually faster. A user can now set this option toTRUE
andprepInputs
and several components ofpostProcess
will useterra
internally. verbose
-
Default:
FALSE
. If set toTRUE
then everyCache
call will show a summary of the objects being cached, theirobject.size
and the time it took to digest them and also the time it took to run the call and save the call to the cache repository or load the cached copy from the repository. This may help diagnosing some problems that may occur.
Value
This function returns a list of all the options that the reproducible
package
sets and uses. See below for details of each.
Advanced
The following options are likely not needed by a user.
cloudChecksumsFilename
-
Default:
file.path(dirname(.reproducibleTempCacheDir()), "checksums.rds")
. Used as an experimental argument inCache()
length
-
Default:
Inf
. Used inCache()
, specifically to the internal calls toCacheDigest()
. This is passed todigest::digest
. Mostly this would be changed from defaultInf
if the digesting is taking too long. Use this with caution, as some objects will have manyNA
values in their first many elements useragent
-
Default:
"https://github.com/PredictiveEcology/reproducible"
. User agent for downloads using this package.