snowfall-init {snowfall} | R Documentation |
Initialisation of cluster usage
Description
Initialisation and organisation code to use snowfall.
Usage
sfInit( parallel=NULL, cpus=NULL, type=NULL, socketHosts=NULL, restore=NULL,
slaveOutfile=NULL, nostart=FALSE, useRscript=FALSE )
sfStop( nostop=FALSE )
sfParallel()
sfIsRunning()
sfCpus()
sfNodes()
sfGetCluster()
sfType()
sfSession()
sfSocketHosts()
sfSetMaxCPUs( number=32 )
Arguments
parallel |
Logical determinating parallel or sequential execution. If not set values from commandline are taken. |
cpus |
Numerical amount of CPUs requested for the cluster. If not set, values from the commandline are taken. |
nostart |
Logical determinating if the basic cluster setup should be skipped. Needed for nested use of snowfall and usage in packages. |
type |
Type of cluster. Can be 'SOCK', 'MPI', 'PVM' or 'NWS'. Default is 'SOCK'. |
socketHosts |
Host list for socket clusters. Only needed for socketmode (SOCK) and if using more than one machines (if using only your local machine (localhost) no list is needed). |
restore |
Globally set the restore behavior in the call
|
slaveOutfile |
Write R slave output to this file. Default: no
output (Unix: |
useRscript |
Change startup behavior (snow>0.3 needed): use shell scripts or R-script for startup (R-scripts beeing the new variant, but not working with sfCluster. |
nostop |
Same as noStart for ending. |
number |
Amount of maximum CPUs useable. |
Details
sfInit
initialisise the usage of the snowfall functions
and - if running in parallel mode - setup the cluster and
snow. If using
sfCluster
management tool, call this without arguments. If
sfInit
is called with arguments, these overwrite
sfCluster
settings. If running parallel, sfInit
set up the
cluster by calling makeCluster
from snow. If using with
sfCluster
, the initialisation also contains management of
lockfiles. If this function is called more than once and current
cluster is yet running, sfStop
is called automatically.
Note that you should call sfInit
before using any other function
from snowfall, with the only exception sfSetMaxCPUs
.
If you do not call sfInit
first, on calling any snowfall
function sfInit
is called without any parameters, which is
equal to sequential mode in snowfall only mode or the settings from
sfCluster if used with sfCluster.
This also means, you cannot check if sfInit
was called from
within your own program, as any call to a function will initialize
again. Therefore the function sfIsRunning
gives you a logical
if a cluster is running. Please note: this will not call sfInit
and it also returns true if a previous running cluster was stopped via
sfStop
in the meantime.
If you use snowfall in a package argument nostart
is very
handy if mainprogram uses snowfall as well. If set, cluster
setup will be skipped and both parts (package and main program) use
the same cluster.
If you call sfInit
more than one time in a program without
explicit calling sfStop
, stopping of the cluster will be
executed automatically. If your R-environment does not cover required
libraries, sfInit
automatically switches to sequential mode
(with a warning). Required libraries for parallel usage are snow
and depending on argument type
the libraries for the
cluster mode (none for
socket clusters, Rmpi for MPI clusters, rpvm for
PVM clusters and nws for NetWorkSpaces).
If using Socket or NetWorkSpaces, socketHosts
can be used to
specify the hosts you want to have your workers running.
Basically this is a list, where any entry can be a plain character
string with IP or hostname (depending on your DNS settings). Also
for real heterogenous clusters for any host pathes are setable. Please
look to the acccording snow documentation for details.
If you are not giving an socketlist, a list with the required amount
of CPUs on your local machine (localhost) is used. This would be the
easiest way to use parallel computing on a single machine, like a
laptop.
Note there is limit on CPUs used in one program (which can be
configured on package installation). The current limit are 32 CPUs. If
you need a higher amount of CPUs, call sfSetMaxCPUs
before the first call to sfInit
. The limit is set to
prevent inadvertently request by single users affecting the cluster as
a whole.
Use slaveOutfile
to define a file where to write the log
files. The file location must be available on all nodes. Beware of
taking a location on a shared network drive! Under *nix systems, most
likely the directories /tmp
and /var/tmp
are not shared
between the different machines. The default is no output file.
If you are using sfCluster
this
argument have no meaning as the slave logs are always created in a
location of sfClusters
choice (depending on it's configuration).
sfStop
stop cluster. If running in parallel mode, the LAM/MPI
cluster is shut down.
sfParallel
, sfCpus
and sfSession
grant access to
the internal state of the currently used cluster.
All three can be configured via commandline and especially with
sfCluster
as well, but given
arguments in sfInit
always overwrite values on commandline.
The commandline options are --parallel (empty option. If missing,
sequential mode is forced), --cpus=X (for nodes, where X is a
numerical value) and --session=X (with X a string).
sfParallel
returns a
logical if program is running in parallel/cluster-mode or sequential
on a single processor.
sfCpus
returns the size of the cluster in CPUs
(equals the CPUs which are useable). In sequential mode sfCpus
returns one. sfNodes
is a deprecated similar to sfCpus
.
sfSession
returns a string with the
session-identification. It is mainly important if used with the
sfCluster
tool.
sfGetCluster
gets the snow-cluster handler. Use for
direct calling of snow functions.
sfType
returns the type of the current cluster backend (if
used any). The value can be SOCK, MPI, PVM or NWS for parallel
modes or "- sequential -" for sequential execution.
sfSocketHosts
gives the list with currently used hosts for
socket clusters. Returns empty list if not used in socket mode (means:
sfType() != 'SOCK'
).
sfSetMaxCPUs
enables to set a higher maximum CPU-count for this
program. If you need higher limits, call sfSetMaxCPUs
before
sfInit
with the new maximum amount.
See Also
See snow documentation for details on commands:
link[snow]{snow-cluster}
Examples
## Not run:
# Run program in plain sequential mode.
sfInit( parallel=FALSE )
stopifnot( sfParallel() == FALSE )
sfStop()
# Run in parallel mode overwriting probably given values on
# commandline.
# Executes via Socket-cluster with 4 worker processes on
# localhost.
# This is probably the best way to use parallel computing
# on a single machine, like a notebook, if you are not
# using sfCluster.
# Uses Socketcluster (Default) - which can also be stated
# using type="SOCK".
sfInit( parallel=TRUE, cpus=4 )
stopifnot( sfCpus() == 4 )
stopifnot( sfParallel() == TRUE )
sfStop()
# Run parallel mode (socket) with 4 workers on 3 specific machines.
sfInit( parallel=TRUE, cpus=4, type="SOCK",
socketHosts=c( "biom7", "biom7", "biom11", "biom12" ) )
stopifnot( sfCpus() == 4 )
stopifnot( sfParallel() == TRUE )
sfStop()
# Hook into MPI cluster.
# Note: you can use any kind MPI cluster Rmpi supports.
sfInit( parallel=TRUE, cpus=4, type="MPI" )
sfStop()
# Hook into PVM cluster.
sfInit( parallel=TRUE, cpus=4, type="PVM" )
sfStop()
# Run in sfCluster-mode: settings are taken from commandline:
# Runmode (sequential or parallel), amount of nodes and hosts which
# are used.
sfInit()
# Session-ID from sfCluster (or XXXXXXXX as default)
session <- sfSession()
# Calling a snow function: cluster handler needed.
parLapply( sfGetCluster(), 1:10, exp )
# Same using snowfall wrapper, no handler needed.
sfLapply( 1:10, exp )
sfStop()
## End(Not run)