R: Parallel wrapper function to call from within a function

parDosa {dclone}

R Documentation

Parallel wrapper function to call from within a function

Description

parDosa is a wrapper function around many functionalities of the parallel package. It is designed to work closely with MCMC fitting functions, e.g. can easily be called from inside of a function.

Usage

parDosa(cl, seq, fun, cldata,
    lib = NULL, dir = NULL, evalq=NULL,
    size = 1, balancing = c("none", "load", "size", "both"),
    rng.type = c("none", "RNGstream"),
    cleanup = TRUE, unload = FALSE, iseed=NULL, ...)

Arguments

`cl`	A cluster object created by `makeCluster`, or an integer. It can also be `NULL`, see Details.
`seq`	A vector to split.
`fun`	A function or character string naming a function.
`cldata`	A list containing data. This list is then exported to the cluster by `clusterExport`. It is stored in a hidden environment. Data in `cldata` can be used by `fun`.
`lib`	Character, name of package(s). Optionally packages can be loaded onto the cluster. More than one package can be specified as character vector. Packages already loaded are skipped.
`dir`	Working directory to use, if `NULL` working directory is not set on workers (default). Can be a vector to set different directories on workers.
`evalq`	Character, expressions to evaluate, e.g. for changing global options (passed to `clusterEvalQ`). More than one expressions can be specified as character vector.
`balancing`	Character, type of balancing to perform (see Details).
`size`	Vector of problem sizes (or relative performance information) corresponding to elements of `seq` (recycled if needed). The default `1` indicates equality of problem sizes.
`rng.type`	Character, `"none"` will not set any seeds on the workers, `"RNGstream"` selects the `"L'Ecuyer-CMRG"` RNG and then distributes streams to the members of a cluster, optionally setting the seed of the streams by `set.seed(iseed)` (otherwise they are set from the current seed of the master process: after selecting the L'Ecuyer generator). See `clusterSetRNGStream`. The logical value `!(rng.type == "none")` is used for forking (e.g. when `cl` is integer).
`cleanup`	logical, if `cldata` should be removed from the workers after applying `fun`. If `TRUE`, effects of `dir` argument is also cleaned up.
`unload`	logical, if `pkg` should be unloaded after applying `fun`.
`iseed`	integer or `NULL`, passed to `clusterSetRNGStream` to be supplied to `set.seed` on the workers, or NULL not to set reproducible seeds.
`...`	Other arguments of `fun`, that are simple values and not objects. (Arguments passed as objects should be specified in `cldata`, otherwise those are not exported to the cluster by this function.)

Details

The function uses 'snow' type clusters when cl is a cluster object. The function uses 'multicore' type forking (shared memory) when cl is an integer. The value from getOption("mc.cores") is used if the argument is NULL.

The function sets the random seeds, loads packages lib onto the cluster, sets the working directory as dir, exports cldata and evaluates fun on seq.

No balancing (balancing = "none") means, that the problem is split into roughly equal subsets, without respect to size (see clusterSplit). This splitting is deterministic (reproducible).

Load balancing (balancing = "load") means, that the problem is not splitted into subsets a priori, but subsequent items are placed on the worker which is empty (see clusterApplyLB for load balancing). This splitting is non-deterministic (might not be reproducible).

Size balancing (balancing = "size") means, that the problem is splitted into subsets, with respect to size (see clusterSplitSB and parLapplySB). In size balancing, the problem is re-ordered from largest to smallest, and then subsets are determined by minimizing the total approximate processing time. This splitting is deterministic (reproducible).

Size and load balancing (balancing = "both") means, that the problem is re-ordered from largest to smallest, and then undeterministic load balancing is used (see parLapplySLB). If size is correct, this is identical to size balancing. This splitting is non-deterministic (might not be reproducible).

Value

Usually a list with results returned by the cluster.

Author(s)

Peter Solymos, solymos@ualberta.ca