init_pool {AzureRMR} | R Documentation |
Manage parallel Azure connections
Description
Manage parallel Azure connections
Usage
init_pool(size = 10, restart = FALSE, ...)
delete_pool()
pool_exists()
pool_size()
pool_export(...)
pool_lapply(...)
pool_sapply(...)
pool_map(...)
pool_call(...)
pool_evalq(...)
Arguments
size |
For |
restart |
For |
... |
Other arguments passed on to functions in the parallel package. See below. |
Details
AzureRMR provides the ability to parallelise communicating with Azure by utilizing a pool of R processes in the background. This often leads to major speedups in scenarios like downloading large numbers of small files, or working with a cluster of virtual machines. This functionality is intended for use by packages that extend AzureRMR (and was originally implemented as part of the AzureStor package), but can also be called directly by the end-user.
A small API consisting of the following functions is currently provided for managing the pool. They pass their arguments down to the corresponding functions in the parallel package.
-
init_pool
initialises the pool, creating it if necessary. The pool is created by callingparallel::makeCluster
with the pool size and any additional arguments. Ifinit_pool
is called and the current pool is smaller thansize
, it is resized. -
delete_pool
shuts down the background processes and deletes the pool. -
pool_exists
checks for the existence of the pool, returning a TRUE/FALSE value. -
pool_size
returns the size of the pool, or zero if the pool does not exist. -
pool_export
exports variables to the pool nodes. It callsparallel::clusterExport
with the given arguments. -
pool_lapply
,pool_sapply
andpool_map
carry out work on the pool. They callparallel::parLapply
,parallel::parSapply
andparallel::clusterMap
with the given arguments. -
pool_call
andpool_evalq
execute code on the pool nodes. They callparallel::clusterCall
andparallel::clusterEvalQ
with the given arguments.
The pool is persistent for the session or until terminated by delete_pool
. You should initialise the pool by calling init_pool
before running any code on it. This restores the original state of the pool nodes by removing any objects that may be in memory, and resetting the working directory to the master working directory.
See Also
parallel::makeCluster, parallel::clusterCall, parallel::parLapply
Examples
## Not run:
init_pool()
pool_size()
x <- 42
pool_export("x")
pool_sapply(1:5, function(i) i + x)
init_pool()
# error: x no longer exists on nodes
try(pool_sapply(1:5, function(i) i + x))
delete_pool()
## End(Not run)