R: Get runs information

runs_info {guildai}

R Documentation

Get runs information

Description

Returns a dataframe with information about the guild runs stored in guild home. Guild home is determined either by consulting the env var Sys.getenv("GUILD_HOME"), or if unset, by looking for a .guild directory, starting from the current working directory and walking up parent directories up to ~ or /.

Usage

runs_info(
  runs = NULL,
  ...,
  filter = NULL,
  operation = NULL,
  label = NULL,
  unlabeled = NA,
  tag = NULL,
  comment = NULL,
  marked = NA,
  unmarked = NA,
  started = NULL,
  digest = NULL,
  running = NA,
  completed = NA,
  error = NA,
  terminated = NA,
  pending = NA,
  staged = NA,
  deleted = NA,
  include_batch = NA
)

Arguments

`runs`	a runs specification.
`...`	passed on to `guild`.
`filter`	(character vector) Filter runs using a guild filter expression. See details section.
`operation`	(character vector) Filter runs with matching `operation`s. A run is only included if any part of its full operation name matches the value.
`label`	(character vector) Filter runs with matching labels.
`unlabeled`	(bool) Filter only runs without labels.
`tag`	(character vector) Filter runs with `tag`.
`comment`	(character vector) Filter runs with comments matching.
`marked`	(bool) Filter only marked runs.
`unmarked`	(bool) Filter only unmarked runs.
`started`	(string) Filter only runs started within RANGE. See details for valid time ranges.
`digest`	(string) Filter only runs with a matching source code digest.
`running`	(bool) Filter only runs that are still running.
`completed`	(bool) Filter only completed runs.
`error`	(bool) Filter only runs that exited with an error.
`terminated`	(bool) Filter only runs terminated by the user.
`pending`	(bool) Filter only pending runs.
`staged`	(bool) Filter only staged runs.
`deleted`	(bool) Show deleted runs.
`include_batch`	(bool) Include batch runs.

Details

Guild has support for a custom filter expression syntax. This syntax is primarily useful in the terminal, and R users will generally prefer to filter the returned dataframe directly using dplyr::filter() or [. Nevertheless, R users can supply guild filter expressions here as well.

Filter by Expression

Use filter to limit runs that match a filter expressions. Filter expressions compare run attributes, flag values, or scalars to target values. They may include multiple expressions with logical operators.

For example, to match runs with flag batch-size equal to 100 that have loss less than 0.8, use:

runs_info(filter = "batch-size = 10 and loss < 0.8")

Target values may be numbers, strings or lists containing numbers and strings. Lists are defined using square braces where each item is separated by a comma.

Comparisons may use the following operators: '=', '!=', '<', '<=', '>', '>='.

Text comparisons may use 'contains' to test for case-insensitive string membership. A value may be tested for membership or not in a list using 'in' or 'not in' respectively. An value may be tested for undefined using 'is undefined' or defined using 'is not undefined'.

Logical operators include 'or' and 'and'. An expression may be negated by preceding it with 'not'. Parentheses may be used to control the order of precedence when expressions are evaluated.

If a value reference matches more than one type of run information (e.g. a flag is named 'label', which is also a run attribute), the value is read in order of run attribute, then flag value, then scalar. To disambiguate the reference, use a prefix ⁠attr:⁠, ⁠flag:⁠, or ⁠scalar:⁠ as needed. For example, to filter using a flag value named 'label', use 'flag:label'.

Other examples:

"operation = train and acc > 0.9"
"operation = train and (acc > 0.9 or loss < 0.3)"
"batch-size = 100 or batch-size = 200"
"batch-size in [100,200]"
"batch-size not in [400,800]"
"batch-size is undefined"
"batch-size is not undefined"
"label contains best and operation not in [test,deploy]"
"status in [error,terminated]"

NOTE: Comments and tags are not supported in filter expressions at this time. Use comment and tag options along with filter expressions to further refine a selection.

Filter by Run Start Time

Use started to limit runs to those that have started within a specified time range.

runs_info(started = 'last hour')

You can specify a time range using several different forms:

"after DATETIME"
"before DATETIME"
"between DATETIME and DATETIME"
"last N minutes|hours|days"
"today|yesterday"
"this week|month|year"
"last week|month|year"
"N days|weeks|months|years ago"

DATETIME may be specified as a date in the format YY-MM-DD (the leading ⁠YY-⁠ may be omitted) or as a time in the format HH:MM (24 hour clock). A date and time may be specified together as ⁠DATE TIME⁠.

When using ⁠between DATETIME and DATETIME⁠, values for DATETIME may be specified in either order.

When specifying values like minutes and hours the trailing s may be omitted to improve readability. You may also use min instead of minutes and hr instead of hours.

Examples:

"after 7-1"
"after 9:00"
"between 1-1 and 4-30"
"between 10:00 and 15:00"
"last 30 min"
"last 6 hours"
"today"
"this week"
"last month"
"3 weeks ago"

Filter by Run Status

Runs may also be filtered by specifying one or more status filters: running, completed, error, and terminated. These may be used together to include runs that match any of the filters. For example to only include runs that were either terminated or exited with an error, use

runs_info(terminated = TRUE, error = TRUE)

Status filters are applied before RUN indexes are resolved. For example, a run index of 1 (as in, runs_info(1, terminated = TRUE, error = TRUE) is the latest run that matches the status filters.

Value

A dataframe (tibble) of runs

Examples

## Not run: 
withr::with_package("dplyr", {

runs_info() # get the full set of runs
runs_info(1) # get the most recent run
runs_info(1:3) # get the last 3 runs

# some other examples for passing filter expressions
runs_info(staged = TRUE) # list only staged runs
runs_info(tag = c("convnet", "keras"), started = "last hour")
runs_info(error = TRUE)

runs <- runs_info()

# filter down the runs list to ones of interest
runs <- runs %>%
  filter(exit_status == 0) %>% # run ended without an error code
  filter(scalars$test_accuracy > .8) %>%
  filter(flags$epochs > 10) %>%
  arrange(scalars$test_loss) %>%
  select(id, flags, scalars)

# retrieve full scalars history from the runs of interest
runs$id %>%
  runs_scalars()

# export the best run
best_runs_dir <- tempfile()
dir.create(best_runs_dir)
runs %>%
  slice_max(scalars$test_accuracy) %>%
  runs_tag("best") %>%
  runs_export(best_runs_dir)

})

## End(Not run)

[Package guildai version 0.0.1 Index]