get.environment {provParseR}R Documentation

Provenance access functions

Description

These functions extract information from a ProvInfo object created by the prov.parse function and return this information as a data frame.

Usage

get.environment(prov)

get.libs(prov)

get.tool.info(prov)

get.args(prov)

get.scripts(prov)

get.saved.scripts(prov)

get.proc.nodes(prov)

get.data.nodes(prov)

get.stdout.nodes(prov)

get.error.nodes(prov)

get.func.nodes(prov)

get.proc.proc(prov)

get.data.proc(prov)

get.proc.data(prov)

get.func.proc(prov)

get.func.lib(prov)

get.input.files(prov, only.files = FALSE)

get.urls(prov)

get.output.files(prov)

get.preexisting(prov)

get.variables.set(prov)

get.variables.used(prov)

get.variable.named(prov, var.name)

get.val.type(prov, node.id = NULL)

Arguments

prov

a ProvInfo object created by calling prov.parse.

only.files

If true, the output of get.input.files contains just files. If false, it contains both files and URLs.

var.name

a string containing the name of a variable used in the script the provenance is for

node.id

A vector of node id.

Value

All access functions return NULL if there is no parsed provenance. If parsed provenance exists, but there is no provenance for the type of information requested, such as no input files, an empty data frame is returned.

get.environment returns a data frame containing information about how the provenance was collected. The data frame has 2 columns: label and value. The labels are:

get.libs returns a data frame describing the libraries used by the script. It contains 3 columns: id, name, and version.

get.tool.info returns a data frame describing the tool that collected the provenance. It contains 3 columns: tool.name, tool.version and json.version.

get.args returns a named list describing the arguments that were passed to prov.run or prov.init when the provenance was collected. Each element is the value of an argument in its original type, each element name is the name of the arguemnt the value corresponds to.

get.scripts returns a data frame identifying all the scripts executed. The main script will be first, followed by all sourced scripts. The data frame contains 2 columns: name and timestamp (when the script was last modified).

get.saved.scripts returns a data frame identifying the location of saved copies of all the scripts executed. The main script will be first, followed by all sourced scripts. The data frame contains 2 columns: name and timestamp (when the script was last modified).

get.proc.nodes returns a data frame identifying all the procedural nodes executed. These are represented in PROV-JSON as activities and include nodes corresponding to lines of code, start or finish nodes that surround blocks of code, and nodes to represent the binding of function arguments to parameters. The data frame contains 8 columns:

get.data.nodes returns a data frame with an entry for each data node in the provenance. The data frame contains the following columns:

get.stdout.nodes returns a data frame with an entry for each standard output node in the provenance. The data frame contains the following columns:

get.error.nodes returns a data frame with an entry for each error node in the provenance. The data frame contains the following columns:

get.func.nodes returns a data frame containing information about the functions used from other libraries within the script. The data frame has 2 columns: id (a unique id) and name (the name of the function called).

get.proc.proc returns a data frame containing information about the edges that go between two procedural nodes. These edges indicate a control-flow relationship between the two activities. The data frame has 3 columns: id (a unique id), informant (the tail of the edge), and informed (the head of the edge).

get.data.proc returns a data frame containing information about the edges that go from data nodes to procedural nodes. These edges indicate an input relationship where the data is used by the activity. The data frame has 3 columns: id (a unique id), entity (the input data), and activity (the procedural node that uses the data).

get.proc.data returns a data frame containing information about the edges that go from procedural nodes to data nodes. These edges indicate an output relationship where the data is produed by the activity. The data frame has 3 columns: id (a unique id), entity (the output data), and activity (the procedural node that produces the data).

get.proc.func returns a data frame containing information about where externally-defined functions are used in the script. The data frame has 3 columns: func_id (the id of the function node), activity (the procedural node that calls the function) and function (the function's name).

get.func.lib returns a data frame containing information about what libraries externally-defined functions come from. The data frame has 3 columns: func_id (the id of the function node), library (a library node) and function (the name of a function).

get.input.files returns a data frame containing a subset of the data nodes that correspond to files that are read by the script. If only.files is False, the data frame contains information about both input files and URLs.

get.urls returns a data frame containing a subset of the data nodes that correspond to urls used in the script.

get.output.files returns a data frame containing a subset of the data nodes that correspond to files that are written by the script.

get.preexisting returns a data frame containing variables in the global environment that are used but not set by a script or a console session.

get.variables.set returns a data frame containing a subset of the data nodes that correspond to variables assigned to in the script.

get.variables.used returns a data frame containing a subset of the data nodes that correspond to variables whose values are used in the script.

get.variable.named returns a data frame containing a subset of the data nodes that correspond to variables with the specified name.

A data frame containing the valType of the specified data node, or the valTypes of all data nodes if no data node is specified. Return NULL if there are no data nodes or if the specified data node is not found. If not NULL, the data frame will contain 4 columns in the following order:

See Also

prov.parse

Examples

prov <- prov.parse(system.file ("testdata", "prov.json", package="provParseR", mustWork=TRUE))
get.proc.nodes(prov)
get.input.files(prov)
get.urls(prov)
get.output.files(prov)
get.variables.set(prov)
get.variables.used(prov)
get.variable.named(prov, "z")
get.data.nodes(prov)
get.error.nodes(prov)
get.func.nodes(prov)
get.proc.proc(prov)
get.data.proc(prov)
get.proc.data(prov)
get.func.proc(prov)
get.func.lib(prov)
get.libs(prov)
get.scripts(prov)
get.environment(prov)
get.val.type(prov, "d1")
get.tool.info(prov)
get.args(prov)
get.stdout.nodes(prov)


[Package provParseR version 1.0 Index]