ApplyFactory {xegaPopulation}R Documentation

Configure the the execution model for gene evaluation.

Description

The current approach to distribution/parallelization of the genetic algorithm is to parallelize the evaluation of the fitness function only. The execution model defines the function lF$lapply() used in the function EvalPopulation().

Usage

ApplyFactory(method = "Sequential")

Arguments

method

The label of the execution model: "Sequential" | "MultiCore" | "FutureApply" | "Cluster".

Details

Currently we support the following parallelization models:

  1. "Sequential": Uses base::apply(). (Default).

  2. "MultiCore": Uses parallel::mclapply().

  3. "FutureApply": Uses future.apply::future_lapply() Plans must be set up and worker processes must be stopped.

  4. "Cluster": Uses parallel:parLapply(). A cluster object must be set up and the worker processes must be stopped.

The execution model "MultiCore" provides parallelization restricted to a single computer: The master process starts R slave processes by fork() which are are run in separate memory spaces. At the time of fork() both memory spaces have the same content. Memory writes performed by one of the processes do not affect the other.

The execution model "FutureApply" makes the possibilities of the future backends for a wide range of parallel and distributed architectures available. The models of parallel resolving a future use different types of communication between master and slaves:

  1. plan(sequential) configures sequential execution. Default.

  2. w<-5; plan(multicore, workers=w) configures an asynchronous multicore execution of futures on 5 workers.

  3. w<-8; plan(multisession, workers=w) configures a multisession environment with 5 workers. The evaluation of the future is done in parallel in 5 other R sessions on the same machine. Communication is done via socket connections, the R sessions started serve multiple futures over their life time. The worker R sessions are stopped by calling plan(sequential). The number of parallel sessions is restricted by the availability of connections. Up to R version 4.3, a maximum of 125 connections is available.

  4. w<-7; plan(callr, workers=w) configures the evaluation of futures on top of the callr package. The callr package creates for each future a separate R session. The communications is via files of serialized R objects. The advantages of callr are:

    1. Each callr future is evaluated in a new R session which ends as soon as the value of the future has been collected.

    2. The number of parallel callr futures is not restricted by the number of available connections, because the communication is based on files of serialized R objects.

    3. No ports are used. This means no port clashes with other processes and no firewall issues.

  5. Setting up a cluster environment for resolving futures works as follows. Write a function with the following elements:

    1. Generate a cluster object:

      cl<-makeClusterPSOCK(rep("localhost", workers)

    2. Set up an on.exit condition for stopping the worker processes.

      on.exit(parallel::stopCluster(cl))

    3. Set up the plan for resolving the future:

      oldplan<-plan(cluster, workers=cl)

    4. Call the function with future.apply::future_lapply. E.g. the genetic algorithm.

    5. Restore the previous plan: plan(oldplan)

    The cluster processes may be located on one or several computers. The communication between the processes is via sockets. Remote computers must allow the use of ssh to start R-processes without an interactive login.

The execution model "Cluster" allows the configuration of master-slave processing on local and remote machines.

Value

A function with the same result as the lapply()-function.

See Also

Other Configuration: AcceptFactory(), CoolingFactory(), CrossRateFactory(), MutationRateFactory(), xegaConfiguration()


[Package xegaPopulation version 1.0.0.0 Index]