tar_make {targets} | R Documentation |
Run a pipeline of targets.
Description
Run the pipeline you defined in the targets
script file (default: _targets.R
). tar_make()
runs the correct targets in the correct order and stores the return
values in _targets/objects/
. Use tar_read()
to read a target
back into R, and see
https://docs.ropensci.org/targets/reference/index.html#clean
to manage output files.
Usage
tar_make(
names = NULL,
shortcut = targets::tar_config_get("shortcut"),
reporter = targets::tar_config_get("reporter_make"),
seconds_meta_append = targets::tar_config_get("seconds_meta_append"),
seconds_meta_upload = targets::tar_config_get("seconds_meta_upload"),
seconds_reporter = targets::tar_config_get("seconds_reporter"),
seconds_interval = targets::tar_config_get("seconds_interval"),
callr_function = callr::r,
callr_arguments = targets::tar_callr_args_default(callr_function, reporter),
envir = parent.frame(),
script = targets::tar_config_get("script"),
store = targets::tar_config_get("store"),
garbage_collection = targets::tar_config_get("garbage_collection"),
use_crew = targets::tar_config_get("use_crew"),
terminate_controller = TRUE,
as_job = targets::tar_config_get("as_job")
)
Arguments
names |
Names of the targets to run or check. Set to NULL to
check/run all the targets (default).
The object supplied to names should be a
tidyselect expression like any_of() or starts_with()
from tidyselect itself, or tar_described_as() to select target names
based on their descriptions.
|
shortcut |
Logical of length 1, how to interpret the names argument.
If shortcut is FALSE (default) then the function checks
all targets upstream of names as far back as the dependency graph goes.
shortcut = TRUE increases speed if there are a lot of
up-to-date targets, but it assumes all the dependencies
are up to date, so please use with caution.
It relies on stored metadata for information about upstream dependencies.
shortcut = TRUE only works if you set names .
|
reporter |
Character of length 1, name of the reporter to user.
Controls how messages are printed as targets run in the pipeline.
Defaults to tar_config_get("reporter_make") . Choices:
-
"silent" : print nothing.
-
"summary" : print a running total of the number of each targets in
each status category (queued, dispatched, skipped, completed, canceled,
or errored). Also show a timestamp ("%H:%M %OS2" strptime() format)
of the last time the progress changed and printed to the screen.
-
"timestamp" : same as the "verbose" reporter except that each
.message begins with a time stamp.
-
"timestamp_positives" : same as the "timestamp" reporter
except without messages for skipped targets.
-
"verbose" : print messages for individual targets
as they start, finish, or are skipped. Each individual
target-specific time (e.g. "3.487 seconds") is strictly the
elapsed runtime of the target and does not include
steps like data retrieval and output storage.
-
"verbose_positives" : same as the "verbose" reporter
except without messages for skipped targets.
|
seconds_meta_append |
Positive numeric of length 1 with the minimum
number of seconds between saves to the local metadata and progress files
in the data store.
Higher values generally make the pipeline run faster, but unsaved
work (in the event of a crash) is not up to date.
When the pipeline ends,
all the metadata and progress data is saved immediately,
regardless of seconds_meta_append .
|
seconds_meta_upload |
Positive numeric of length 1 with the minimum
number of seconds between uploads of the metadata and progress data
to the cloud
(see https://books.ropensci.org/targets/cloud-storage.html).
Higher values generally make the pipeline run faster, but unsaved
work (in the event of a crash) may not be backed up to the cloud.
When the pipeline ends,
all the metadata and progress data is uploaded immediately,
regardless of seconds_meta_upload .
|
seconds_reporter |
Positive numeric of length 1 with the minimum
number of seconds between times when the reporter prints progress
messages to the R console.
|
seconds_interval |
Deprecated on 2023-08-24 (version 1.2.2.9001).
Use seconds_meta_append , seconds_meta_upload ,
and seconds_reporter instead.
|
callr_function |
A function from callr to start a fresh clean R
process to do the work. Set to NULL to run in the current session
instead of an external process (but restart your R session just before
you do in order to clear debris out of the global environment).
callr_function needs to be NULL for interactive debugging,
e.g. tar_option_set(debug = "your_target") .
However, callr_function should not be NULL for serious
reproducible work.
|
callr_arguments |
A list of arguments to callr_function .
|
envir |
An environment, where to run the target R script
(default: _targets.R ) if callr_function is NULL .
Ignored if callr_function is anything other than NULL .
callr_function should only be NULL for debugging and
testing purposes, not for serious runs of a pipeline, etc.
The envir argument of tar_make() and related
functions always overrides
the current value of tar_option_get("envir") in the current R session
just before running the target script file,
so whenever you need to set an alternative envir , you should always set
it with tar_option_set() from within the target script file.
In other words, if you call tar_option_set(envir = envir1) in an
interactive session and then
tar_make(envir = envir2, callr_function = NULL) ,
then envir2 will be used.
|
script |
Character of length 1, path to the
target script file. Defaults to tar_config_get("script") ,
which in turn defaults to _targets.R . When you set
this argument, the value of tar_config_get("script")
is temporarily changed for the current function call.
See tar_script() ,
tar_config_get() , and tar_config_set() for details
about the target script file and how to set it
persistently for a project.
|
store |
Character of length 1, path to the
targets data store. Defaults to tar_config_get("store") ,
which in turn defaults to _targets/ .
When you set this argument, the value of tar_config_get("store")
is temporarily changed for the current function call.
See tar_config_get() and tar_config_set() for details
about how to set the data store path persistently
for a project.
|
garbage_collection |
Logical of length 1. For a crew -integrated
pipeline, whether to run garbage collection on the main process
before sending a target
to a worker. Ignored if tar_option_get("controller") is NULL .
Independent from the garbage_collection argument of tar_target() ,
which controls garbage collection on the worker.
|
use_crew |
Logical of length 1, whether to use crew if the
controller option is set in tar_option_set() in the target script
(_targets.R ). See https://books.ropensci.org/targets/crew.html
for details.
|
terminate_controller |
Logical of length 1. For a crew -integrated
pipeline, whether to terminate the controller after stopping
or finishing the pipeline. This should almost always be set to TRUE ,
but FALSE combined with callr_function = NULL
will allow you to get the running controller using
tar_option_get("controller") for debugging purposes.
For example, tar_option_get("controller")$summary() produces a
worker-by-worker summary of the work assigned and completed,
tar_option_get("controller")$queue is the list of unresolved tasks,
and tar_option_get("controller")$results is the list of
tasks that completed but were not collected with pop() .
You can manually terminate the controller with
tar_option_get("controller")$summary() to close down the dispatcher
and worker processes.
|
as_job |
TRUE to run as an RStudio IDE / Posit Workbench job,
if running on RStudio IDE / Posit Workbench.
FALSE to run as a callr process in the main R session
(depending on the callr_function argument).
If as_job is TRUE , then the rstudioapi package must be installed.
|
Value
NULL
except if callr_function = callr::r_bg()
, in which case
a handle to the callr
background process is returned. Either way,
the value is invisibly returned.
Storage access
Several functions like tar_make()
, tar_read()
, tar_load()
,
tar_meta()
, and tar_progress()
read or modify
the local data store of the pipeline.
The local data store is in flux while a pipeline is running,
and depending on how distributed computing or cloud computing is set up,
not all targets can even reach it. So please do not call these
functions from inside a target as part of a running
pipeline. The only exception is literate programming
target factories in the tarchetypes
package such as tar_render()
and tar_quarto()
.
See Also
Other pipeline:
tar_make_clustermq()
,
tar_make_future()
Examples
if (identical(Sys.getenv("TAR_EXAMPLES"), "true")) { # for CRAN
tar_dir({ # tar_dir() runs code from a temp dir for CRAN.
tar_script({
list(
tar_target(y1, 1 + 1),
tar_target(y2, 1 + 1),
tar_target(z, y1 + y2)
)
}, ask = FALSE)
tar_make(starts_with("y")) # Only processes y1 and y2.
# Distributed computing with crew:
if (requireNamespace("crew", quietly = TRUE)) {
tar_script({
tar_option_set(controller = crew::controller_local())
list(
tar_target(y1, 1 + 1),
tar_target(y2, 1 + 1),
tar_target(z, y1 + y2)
)
}, ask = FALSE)
tar_make()
}
})
}
[Package
targets version 1.7.1
Index]