pbdMPI-package {pbdMPI} | R Documentation |
R Interface to MPI (Programming with Big Data in R Project)
Description
A simplified, efficient, interface to MPI for HPC clusters. It is a derivation and rethinking of the Rmpi package that embraces the prevalent parallel programming style on HPC clusters. Beyond the interface, a collection of functions for global work with distributed data is included. It is based on S4 classes and methods.
Details
This package requires an MPI library (OpenMPI, MPICH2, or LAM/MPI). Standard
installation in an R session with
> install.packages("pbdMPI")
should work in most cases.
On HPC clusters, it is
strongly recommended that you check with your HPC cluster documentation for
specific requirements, such as
module
software environments. Some module examples relevant to R and MPI
are
$ module load openmpi
$ module load openblas
$ module load flexiblas
$ module load r
possibly giving specific versions and possibly with some upper case letters.
Although module software environments are widely used, the specific module
names and their dependence structure are not standard across cluster
installations. The command
$ module avail
usually lists the available software modules on your cluster.
To install on the Unix command line after
downloading the source file, use R CMD INSTALL
.
If the MPI library is not found, after checking that you are loading the
correct module environments, the following arguments can be used to
specify its non-standard location on your system
Argument | Default |
--with-mpi-type | OPENMPI |
--with-mpi-include | ${MPI_ROOT}/include |
--with-mpi-libpath | ${MPI_ROOT}/lib |
--with-mpi | ${MPI_ROOT} |
where ${MPI_ROOT}
is the path to the MPI root.
See the package source file pbdMPI/configure
for details.
Loading library(pbdMPI)
sets a few global variables, including the
environment .pbd_env
, where many defaults are set, and initializes MPI.
In most cases, the defaults should not be modified. Rather, the parameters
of the functions that use them should be changed. All codes must end
with finalize()
to cleanly exit MPI.
Most functions are assumed to run as Single Program, Multiple Data (SPMD), i.e. in batch mode. SPMD is based on cooperation between parallel copies of a single program, which is more scalable than a manager-workers approach that is natural in interactive programming. Interactivity with an HPC cluster is more efficiently handled by a client-server approach, such as that enabled by the remoter package.
On most clusters, codes run with mpirun
or
mpiexec
and Rscript
, such as
> mpiexec -np 2 Rscript some_code.r
where some_code.r
contains the entire SPMD program. The MPI
Standard 4.0 recommends mpiexec
over mpirun
. Some
MPI implementations may have minor differences between the two but under
OpenMPI 5.0 they are synonyms that produce the same behavior.
The package source files provide several examples based on pbdMPI, such as
Directory | Examples |
pbdMPI/inst/examples/test_spmd/ | main SPMD functions |
pbdMPI/inst/examples/test_rmpi/ | analogues to Rmpi |
pbdMPI/inst/examples/test_parallel/ | analogues to parallel |
pbdMPI/inst/examples/test_performance/ | performance tests |
pbdMPI/inst/examples/test_s4/ | S4 extension |
pbdMPI/inst/examples/test_cs/ | client/server examples |
pbdMPI/inst/examples/test_long_vector/ | long vector examples |
where test_long_vector
needs a recompile with setting
#define MPI_LONG_DEBUG 1
in pbdMPI/src/pkg_constant.h
.
The current version is mainly written and tested under OpenMPI environments on Linux systems (CentOS 7, RHEL 8, Xubuntu). Also, it is tested on macOS with Homebrew-installed OpenMPI and under MPICH2 environments on Windows systems, although the primary target systems are HPC clusters running Linux OS.
Author(s)
Wei-Chen Chen wccsnow@gmail.com, George Ostrouchov, Drew Schmidt, Pragneshkumar Patel, and Hao Yu.
References
Programming with Big Data in R Website: https://pbdr.org/
See Also
allgather()
,
allreduce()
,
bcast()
,
gather()
,
reduce()
,
scatter()
.
Examples
## Not run:
### On command line, run each demo with 2 processors by
### (Use Rscript.exe on Windows systems)
# mpiexec -np 2 Rscript -e "demo(allgather,'pbdMPI',ask=F,echo=F)"
# mpiexec -np 2 Rscript -e "demo(allreduce,'pbdMPI',ask=F,echo=F)"
# mpiexec -np 2 Rscript -e "demo(bcast,'pbdMPI',ask=F,echo=F)"
# mpiexec -np 2 Rscript -e "demo(gather,'pbdMPI',ask=F,echo=F)"
# mpiexec -np 2 Rscript -e "demo(reduce,'pbdMPI',ask=F,echo=F)"
# mpiexec -np 2 Rscript -e "demo(scatter,'pbdMPI',ask=F,echo=F)"
### Or
# execmpi("demo(allgather,'pbdMPI',ask=F,echo=F)", nranks = 2L)
# execmpi("demo(allreduce,'pbdMPI',ask=F,echo=F)", nranks = 2L)
# execmpi("demo(bcast,'pbdMPI',ask=F,echo=F)", nranks = 2L)
# execmpi("demo(gather,'pbdMPI',ask=F,echo=F)", nranks = 2L)
# execmpi("demo(reduce,'pbdMPI',ask=F,echo=F)", nranks = 2L)
# execmpi("demo(scatter,'pbdMPI',ask=F,echo=F)", nranks = 2L)
## End(Not run)