makeSlurmCluster {slurmR} | R Documentation |
Create a Parallel Socket Cluster using Slurm
Description
This function is essentially a wrapper of the function parallel::makePSOCKcluster.
makeSlurmCluster
main feature is adding node addresses.
Usage
makeSlurmCluster(
n,
job_name = random_job_name(),
tmp_path = opts_slurmR$get_tmp_path(),
cluster_opt = list(),
max_wait = 300L,
verb = TRUE,
...
)
## S3 method for class 'slurm_cluster'
stopCluster(cl)
Arguments
n |
Integer scalar. Size of the cluster object (see details). |
job_name |
Character. Name of the job to be passed to |
tmp_path |
Character. Path to the directory where all the data (including scripts) will be stored. Notice that this path must be accessible by all the nodes in the network (See opts_slurmR). |
cluster_opt |
A list of arguments passed to parallel::makePSOCKcluster. |
max_wait |
Integer scalar. Wait time before exiting with error while trying to read the nodes information. |
verb |
Logical scalar. If |
... |
Further arguments passed to Slurm_EvalQ via |
cl |
An object of class |
Details
By default, if the time
option is not specified via ...
,
then it is set to the value 01:00:00
, this is, 1 hour.
Once a job is submitted via Slurm, the user gets access to the nodes associated with it, which allows users to star new processes within those. By means of this, we can create Socket, also known as "PSOCK", clusters across nodes in a Slurm environment. The name of the hosts are retrieved and passed later on to parallel::makePSOCKcluster.
It has been the case that R fails to create the cluster with the following message in the Slurm log file:
srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive
In such cases, setting the memory, for example, upfront can solve the problem. For example:
cl <- makeSlurmCluster(20, mem = 20)
If the problem persists, i.e., the cluster cannot be created, make sure that your Slurm cluster allows Socket connections between nodes.
The method stopCluster
for slurm_cluster
stops the cluster doing
the following:
Closes the connection by calling the
stopCluster
method forPSOCK
objects.Cancel the Slurm job using
scancel
.
Value
A object of class c("slurm_cluster", "SOCKcluster", "cluster")
. It
is the same as what is returned by parallel::makePSOCKcluster with the main
difference that it has two extra attributes:
-
SLURM_JOBID
Which is the id of the Job that initialized that cluster.
Maximum number of connections
By default, R limits the number of simultaneous connections (see this thread
in R-sig-hpc https://stat.ethz.ch/pipermail/r-sig-hpc/2012-May/001373.html)
Current maximum is 128 (R version 3.6.1). To modify that limit, you would need
to reinstall R updating the macro NCONNECTIONS
in the file src/main/connections.c
.
For now, if the user sets n
above 128 it will get an immediate warning
pointing to this issue, in particular, specifying that the cluster object
may not be able to be created.
Examples
## Not run:
# Creating a cluster with 100 workers/offpring/child R sessions
cl <- makeSlurmCluster(100)
# Computing the mean of a 100 random uniforms within each worker
# for this we can use any of the function available in the parallel package.
ans <- parSapply(1:200, function(x) mean(runif(100)))
# We simply call stopCluster as we would do with any other cluster
# object
stopCluster(ans)
# We can also specify SBATCH options directly (...)
cl <- makeSlurmCluster(200, partition = "thomas", time = "02:00:00")
stopCluster(cl)
## End(Not run)