tar_rep {tarchetypes} | R Documentation |
Batched replication with dynamic branching.
Description
Batching is important for optimizing the efficiency
of heavily dynamically-branched workflows:
https://books.ropensci.org/targets/dynamic.html#batching.
tar_rep()
replicates a command in strategically sized batches.
Usage
tar_rep(
name,
command,
batches = 1,
reps = 1,
rep_workers = 1,
tidy_eval = targets::tar_option_get("tidy_eval"),
packages = targets::tar_option_get("packages"),
library = targets::tar_option_get("library"),
format = targets::tar_option_get("format"),
repository = targets::tar_option_get("repository"),
iteration = targets::tar_option_get("iteration"),
error = targets::tar_option_get("error"),
memory = targets::tar_option_get("memory"),
garbage_collection = targets::tar_option_get("garbage_collection"),
deployment = targets::tar_option_get("deployment"),
priority = targets::tar_option_get("priority"),
resources = targets::tar_option_get("resources"),
storage = targets::tar_option_get("storage"),
retrieval = targets::tar_option_get("retrieval"),
cue = targets::tar_option_get("cue"),
description = targets::tar_option_get("description")
)
Arguments
name |
Symbol, name of the target. A target
name must be a valid name for a symbol in R, and it
must not start with a dot. Subsequent targets
can refer to this name symbolically to induce a dependency relationship:
e.g. |
command |
R code to run multiple times. Must return a list or
data frame because |
batches |
Number of batches. This is also the number of dynamic
branches created during |
reps |
Number of replications in each batch. The total number
of replications is |
rep_workers |
Positive integer of length 1, number of local R processes to use to run reps within batches in parallel. If 1, then reps are run sequentially within each batch. If greater than 1, then reps within batch are run in parallel using a PSOCK cluster. |
tidy_eval |
Whether to invoke tidy evaluation
(e.g. the |
packages |
Character vector of packages to load right before
the target runs or the output data is reloaded for
downstream targets. Use |
library |
Character vector of library paths to try
when loading |
format |
Optional storage format for the target's return value.
With the exception of |
repository |
Character of length 1, remote repository for target storage. Choices:
Note: if |
iteration |
Character of length 1, name of the iteration mode of the target. Choices:
|
error |
Character of length 1, what to do if the target stops and throws an error. Options:
|
memory |
Character of length 1, memory strategy.
If |
garbage_collection |
Logical, whether to run |
deployment |
Character of length 1. If |
priority |
Numeric of length 1 between 0 and 1. Controls which
targets get deployed first when multiple competing targets are ready
simultaneously. Targets with priorities closer to 1 get dispatched earlier
(and polled earlier in |
resources |
Object returned by |
storage |
Character of length 1, only relevant to
|
retrieval |
Character of length 1, only relevant to
|
cue |
An optional object from |
description |
Character of length 1, a custom free-form human-readable
text description of the target. Descriptions appear as target labels
in functions like |
Details
tar_rep()
and tar_rep_raw()
each create two targets:
an upstream local stem
with an integer vector of batch ids, and a downstream pattern
that maps over the batch ids. (Thus, each batch is a branch.)
Each batch/branch replicates the command a certain number of times.
If the command returns a list or data frame, then
the targets from tar_rep()
will try to append new elements/columns
tar_batch
, tar_rep
, and tar_seed
to the output
to denote the batch, rep-within-batch index, and rep-specific seed,
respectively.
Both batches and reps within each batch
are aggregated according to the method you specify
in the iteration
argument. If "list"
, reps and batches
are aggregated with list()
. If "vector"
,
then vctrs::vec_c()
. If "group"
, then vctrs::vec_rbind()
.
Value
A list of two targets, one upstream and one downstream.
The upstream target returns a numeric index of batch ids,
and the downstream one dynamically maps over the batch ids
to run the command multiple times.
If the command returns a list or data frame, then
the targets from tar_rep()
will try to append new elements/columns
tar_batch
and tar_rep
to the output
to denote the batch and rep-within-batch IDs, respectively.
See the "Target objects" section for background.
tar_read(your_target)
(on the downstream target with the actual work)
will return a list of lists, where the outer list has one element per
batch and each inner list has one element per rep within batch.
To un-batch this nested list, call
tar_read(your_target, recursive = FALSE)
.
Replicate-specific seeds
In ordinary pipelines, each target has its own unique deterministic
pseudo-random number generator seed derived from its target name.
In batched replicate, however, each batch is a target with multiple
replicate within that batch. That is why tar_rep()
and friends give each replicate its own unique seed.
Each replicate-specific seed is created
based on the dynamic parent target name,
tar_option_get("seed")
(for targets
version 0.13.5.9000 and above),
batch index, and rep-within-batch index.
The seed is set just before the replicate runs.
Replicate-specific seeds are invariant to batching structure.
In other words,
tar_rep(name = x, command = rnorm(1), batches = 100, reps = 1, ...)
produces the same numerical output as
tar_rep(name = x, command = rnorm(1), batches = 10, reps = 10, ...)
(but with different batch names).
Other target factories with this seed scheme are tar_rep2()
,
tar_map_rep()
, tar_map2_count()
, tar_map2_size()
,
and tar_render_rep()
.
For the tar_map2_*()
functions,
it is possible to manually supply your own seeds
through the command1
argument and then invoke them in your
custom code for command2
(set.seed()
, withr::with_seed
,
or withr::local_seed()
). For tar_render_rep()
,
custom seeds can be supplied to the params
argument
and then invoked in the individual R Markdown reports.
Likewise with tar_quarto_rep()
and the execute_params
argument.
Target objects
Most tarchetypes
functions are target factories,
which means they return target objects
or lists of target objects.
Target objects represent skippable steps of the analysis pipeline
as described at https://books.ropensci.org/targets/.
Please read the walkthrough at
https://books.ropensci.org/targets/walkthrough.html
to understand the role of target objects in analysis pipelines.
For developers, https://wlandau.github.io/targetopia/contributing.html#target-factories explains target factories (functions like this one which generate targets) and the design specification at https://books.ropensci.org/targets-design/ details the structure and composition of target objects.
See Also
Other branching:
tar_combine()
,
tar_combine_raw()
,
tar_map()
,
tar_map2()
,
tar_map2_count()
,
tar_map2_count_raw()
,
tar_map2_raw()
,
tar_map2_size()
,
tar_map2_size_raw()
,
tar_map_rep()
,
tar_map_rep_raw()
,
tar_rep2()
,
tar_rep2_raw()
,
tar_rep_map()
,
tar_rep_map_raw()
,
tar_rep_raw()
Examples
if (identical(Sys.getenv("TAR_LONG_EXAMPLES"), "true")) {
targets::tar_dir({ # tar_dir() runs code from a temporary directory.
targets::tar_script({
list(
tarchetypes::tar_rep(
x,
data.frame(x = sample.int(1e4, 2)),
batches = 2,
reps = 3
)
)
})
targets::tar_make()
targets::tar_read(x)
})
}