big_parallelize {bigstatsr} | R Documentation |
Split-parApply-Combine
Description
A Split-Apply-Combine strategy to parallelize the evaluation of a function.
Usage
big_parallelize(
X,
p.FUN,
p.combine = NULL,
ind = cols_along(X),
ncores = nb_cores(),
...
)
Arguments
X |
An object of class FBM. |
p.FUN |
The function to be applied to each subset matrix.
It must take a Filebacked Big Matrix as first argument and
|
p.combine |
Function to combine the results with |
ind |
Initial vector of subsetting indices. Default is the vector of all column indices. |
ncores |
Number of cores used. Default doesn't use parallelism. You may use nb_cores. |
... |
Extra arguments to be passed to |
Details
This function splits indices in parts, then apply a given function to each part and finally combine the results.
Value
Return a list of ncores
elements, each element being the result of
one of the cores, computed on a block. The elements of this list are then
combined with do.call(p.combine, .)
if p.combined
is given.
See Also
big_apply bigparallelr::split_parapply
Examples
## Not run: # CRAN is super slow when parallelism.
X <- big_attachExtdata()
### Computation on all the matrix
true <- big_colstats(X)
big_colstats_sub <- function(X, ind) {
big_colstats(X, ind.col = ind)
}
# 1. the computation is split along all the columns
# 2. for each part the computation is done, using `big_colstats`
# 3. the results (data.frames) are combined via `rbind`.
test <- big_parallelize(X, p.FUN = big_colstats_sub,
p.combine = 'rbind', ncores = 2)
all.equal(test, true)
### Computation on a part of the matrix
n <- nrow(X)
m <- ncol(X)
rows <- sort(sample(n, n/2)) # sort to provide some locality in accesses
cols <- sort(sample(m, m/2)) # idem
true2 <- big_colstats(X, ind.row = rows, ind.col = cols)
big_colstats_sub2 <- function(X, ind, rows, cols) {
big_colstats(X, ind.row = rows, ind.col = cols[ind])
}
# This doesn't work because, by default, the computation is spread
# along all columns. We must explictly specify the `ind` parameter.
tryCatch(big_parallelize(X, p.FUN = big_colstats_sub2,
p.combine = 'rbind', ncores = 2,
rows = rows, cols = cols),
error = function(e) message(e))
# This now works, using `ind = seq_along(cols)`.
test2 <- big_parallelize(X, p.FUN = big_colstats_sub2,
p.combine = 'rbind', ncores = 2,
ind = seq_along(cols),
rows = rows, cols = cols)
all.equal(test2, true2)
## End(Not run)