| chunk {batchtools} | R Documentation |
Chunk Jobs for Sequential Execution
Description
Jobs can be partitioned into “chunks” to be executed sequentially on the computational nodes.
Chunks are defined by providing a data frame with columns “job.id” and “chunk” (integer)
to submitJobs.
All jobs with the same chunk number will be grouped together on one node to form a single
computational job.
The function chunk simply splits x into either a fixed number of groups, or
into a variable number of groups with a fixed number of maximum elements.
The function lpt also groups x into a fixed number of chunks,
but uses the actual values of x in a greedy “Longest Processing Time” algorithm.
As a result, the maximum sum of elements in minimized.
binpack splits x into a variable number of groups whose sum of elements do
not exceed the upper limit provided by chunk.size.
See examples of estimateRuntimes for an application of binpack and lpt.
Usage
chunk(x, n.chunks = NULL, chunk.size = NULL, shuffle = TRUE)
lpt(x, n.chunks = 1L)
binpack(x, chunk.size = max(x))
Arguments
x |
[ |
n.chunks |
[ |
chunk.size |
[ |
shuffle |
[ |
Value
[integer] giving the chunk number for each element of x.
See Also
Examples
ch = chunk(1:10, n.chunks = 2)
table(ch)
ch = chunk(rep(1, 10), chunk.size = 2)
table(ch)
set.seed(1)
x = runif(10)
ch = lpt(x, n.chunks = 2)
sapply(split(x, ch), sum)
set.seed(1)
x = runif(10)
ch = binpack(x, 1)
sapply(split(x, ch), sum)
# Job chunking
tmp = makeRegistry(file.dir = NA, make.default = FALSE)
ids = batchMap(identity, 1:25, reg = tmp)
### Group into chunks with 10 jobs each
library(data.table)
ids[, chunk := chunk(job.id, chunk.size = 10)]
print(ids[, .N, by = chunk])
### Group into 4 chunks
ids[, chunk := chunk(job.id, n.chunks = 4)]
print(ids[, .N, by = chunk])
### Submit to batch system
submitJobs(ids = ids, reg = tmp)
# Grouped chunking
tmp = makeExperimentRegistry(file.dir = NA, make.default = FALSE)
prob = addProblem(reg = tmp, "prob1", data = iris, fun = function(job, data) nrow(data))
prob = addProblem(reg = tmp, "prob2", data = Titanic, fun = function(job, data) nrow(data))
algo = addAlgorithm(reg = tmp, "algo", fun = function(job, data, instance, i, ...) problem)
prob.designs = list(prob1 = data.table(), prob2 = data.table(x = 1:2))
algo.designs = list(algo = data.table(i = 1:3))
addExperiments(prob.designs, algo.designs, repls = 3, reg = tmp)
### Group into chunks of 5 jobs, but do not put multiple problems into the same chunk
# -> only one problem has to be loaded per chunk, and only once because it is cached
ids = getJobTable(reg = tmp)[, .(job.id, problem, algorithm)]
ids[, chunk := chunk(job.id, chunk.size = 5), by = "problem"]
ids[, chunk := .GRP, by = c("problem", "chunk")]
dcast(ids, chunk ~ problem)