crew_launcher_sge {crew.cluster}R Documentation

[Maturing] Create a launcher with Sun Grid Engine (SGE) workers.

Description

Create an R6 object to launch and maintain workers as Sun Grid Engine (SGE) jobs.

Usage

crew_launcher_sge(
  name = NULL,
  seconds_interval = 0.5,
  seconds_timeout = 60,
  seconds_launch = 86400,
  seconds_idle = Inf,
  seconds_wall = Inf,
  tasks_max = Inf,
  tasks_timers = 0L,
  reset_globals = TRUE,
  reset_packages = FALSE,
  reset_options = FALSE,
  garbage_collection = FALSE,
  launch_max = 5L,
  tls = crew::crew_tls(mode = "automatic"),
  verbose = FALSE,
  command_submit = as.character(Sys.which("qsub")),
  command_terminate = as.character(Sys.which("qdel")),
  command_delete = NULL,
  script_directory = tempdir(),
  script_lines = character(0L),
  sge_cwd = TRUE,
  sge_envvars = FALSE,
  sge_log_output = "/dev/null",
  sge_log_error = NULL,
  sge_log_join = TRUE,
  sge_memory_gigabytes_limit = NULL,
  sge_memory_gigabytes_required = NULL,
  sge_cores = NULL,
  sge_gpu = NULL
)

Arguments

name

Name of the launcher.

seconds_interval

Number of seconds between polling intervals waiting for certain internal synchronous operations to complete, such as checking mirai::status().

seconds_timeout

Number of seconds until timing out while waiting for certain synchronous operations to complete, such as checking mirai::status().

seconds_launch

Seconds of startup time to allow. A worker is unconditionally assumed to be alive from the moment of its launch until seconds_launch seconds later. After seconds_launch seconds, the worker is only considered alive if it is actively connected to its assign websocket.

seconds_idle

Maximum number of seconds that a worker can idle since the completion of the last task. If exceeded, the worker exits. But the timer does not launch until tasks_timers tasks have completed. See the idletime argument of mirai::daemon(). crew does not excel with perfectly transient workers because it does not micromanage the assignment of tasks to workers, so please allow enough idle time for a new worker to be delegated a new task.

seconds_wall

Soft wall time in seconds. The timer does not launch until tasks_timers tasks have completed. See the walltime argument of mirai::daemon().

tasks_max

Maximum number of tasks that a worker will do before exiting. See the maxtasks argument of mirai::daemon(). crew does not excel with perfectly transient workers because it does not micromanage the assignment of tasks to workers, it is recommended to set tasks_max to a value greater than 1.

tasks_timers

Number of tasks to do before activating the timers for seconds_idle and seconds_wall. See the timerstart argument of mirai::daemon().

reset_globals

TRUE to reset global environment variables between tasks, FALSE to leave them alone.

reset_packages

TRUE to unload any packages loaded during a task (runs between each task), FALSE to leave packages alone.

reset_options

TRUE to reset global options to their original state between each task, FALSE otherwise. It is recommended to only set reset_options = TRUE if reset_packages is also TRUE because packages sometimes rely on options they set at loading time.

garbage_collection

TRUE to run garbage collection between tasks, FALSE to skip.

launch_max

Positive integer of length 1, maximum allowed consecutive launch attempts which do not complete any tasks. Enforced on a worker-by-worker basis. The futile launch count resets to back 0 for each worker that completes a task. It is recommended to set launch_max above 0 because sometimes workers are unproductive under perfectly ordinary circumstances. But launch_max should still be small enough to detect errors in the underlying platform.

tls

A TLS configuration object from crew_tls().

verbose

Logical, whether to see console output and error messages when submitting worker.

command_submit

Character of length 1, file path to the executable to submit a worker job.

command_terminate

Character of length 1, file path to the executable to terminate a worker job. Set to "" to skip manually terminating the worker. Unless there is an issue with the platform, the job should still exit thanks to the NNG-powered network programming capabilities of mirai. Still, if you set command_terminate = "", you are assuming extra responsibility for manually monitoring your jobs on the cluster and manually terminating jobs as appropriate.

command_delete

Deprecated on 2024-01-08 (version 0.1.4.9001). Use command_terminate instead.

script_directory

Character of length 1, directory path to the job scripts. Just before each job submission, a job script is created in this folder. Script base names are unique to each launcher and worker, and the launcher deletes the script when the worker is manually terminated. tempdir() is the default, but it might not work for some systems. tools::R_user_dir("crew.cluster", which = "cache") is another reasonable choice.

script_lines

Optional character vector of additional lines to be added to the job script just after the more common flags. An example would be script_lines = "module load R" if your cluster supports R through an environment module.

sge_cwd

Logical of length 1, whether to launch the worker from the current working directory (as opposed to the user home directory). sge_cwd = TRUE translates to a line of ⁠#$ -cwd⁠ in the SGE job script. sge_cwd = FALSE omits this line.

sge_envvars

Logical of length 1, whether to forward the environment variables of the current session to the SGE worker. sge_envvars = TRUE translates to a line of ⁠#$ -V⁠ in the SGE job script. sge_envvars = FALSE omits this line.

sge_log_output

Character of length 1, file or directory path to SGE worker log files for standard output. sge_log_output = "VALUE" translates to a line of ⁠#$ -o VALUE⁠ in the SGE job script. The default is ⁠/dev/null⁠ to omit the logs. If you do supply a non-⁠/dev/null⁠ value, it is recommended to supply a directory path with a trailing slash so that each worker gets its own set of log files.

sge_log_error

Character of length 1, file or directory path to SGE worker log files for standard error. sge_log_error = "VALUE" translates to a line of ⁠#$ -e VALUE⁠ in the SGE job script. The default of NULL omits this line. If you do supply a non-⁠/dev/null⁠ value, it is recommended to supply a directory path with a trailing slash so that each worker gets its own set of log files.

sge_log_join

Logical, whether to join the stdout and stderr log files together into one file. sge_log_join = TRUE translates to a line of ⁠#$ -j y⁠ in the SGE job script, while sge_log_join = FALSE is equivalent to ⁠#$ -j n⁠. If sge_log_join = TRUE, then sge_log_error should be NULL.

sge_memory_gigabytes_limit

Optional numeric of length 1 with the maximum number of gigabytes of memory a worker is allowed to consume. If the worker consumes more than this level of memory, then SGE will terminate it. ⁠sge_memory_gigabytes_limit = 5.7"⁠ translates to a line of "#$ -l h_rss=5.7G" in the SGE job script. sge_memory_gigabytes_limit = NULL omits this line.

sge_memory_gigabytes_required

Optional positive numeric of length 1 with the gigabytes of memory required to run the worker. sge_memory_gigabytes_required = 2.4 translates to a line of ⁠#$ -l m_mem_free=2.4G⁠ in the SGE job script. sge_memory_gigabytes_required = NULL omits this line.

sge_cores

Optional positive integer of length 1, number of cores per worker ("slots" in SGE lingo). sge_cores = 4 translates to a line of ⁠#$ -pe smp 4⁠ in the SGE job script. sge_cores = NULL omits this line.

sge_gpu

Optional integer of length 1 with the number of GPUs to request for the worker. sge_gpu = 1 translates to a line of "#$ -l gpu=1" in the SGE job script. sge_gpu = NULL omits this line.

Details

To launch a Sun Grid Engine (SGE) worker, this launcher creates a temporary job script with a call to crew::crew_worker() and submits it as an SGE job with qsub. To see most of the lines of the job script in advance, use the script() method of the launcher. It has all the lines except for the job name and the call to crew::crew_worker(), both of which will be inserted at the last minute when it is time to actually launch a worker.

Attribution

The template files at https://github.com/mschubert/clustermq/tree/master/inst informed the development of the crew launcher plugins in crew.cluster, and we would like to thank Michael Schubert for developing clustermq and releasing it under the permissive Apache License 2.0. See the NOTICE and README.md files in the crew.cluster source code for additional attribution.

See Also

Other sge: crew_class_launcher_sge, crew_class_monitor_sge, crew_controller_sge(), crew_monitor_sge()


[Package crew.cluster version 0.3.2 Index]