create_informant {pointblank} | R Documentation |
Create a pointblank informant object
Description
The create_informant()
function creates an informant object, which is
used in an information management workflow. The overall aim of this
workflow is to record, collect, and generate useful information on data
tables. We can supply any information that is useful for describing a
particular data table. The informant object created by the
create_informant()
function takes information-focused functions:
info_columns()
, info_tabular()
, info_section()
, and info_snippet()
.
The info_*()
series of functions allows for a progressive build up of
information about the target table. The info_columns()
and info_tabular()
functions facilitate the entry of info text that concerns the table columns
and the table proper; the info_section()
function allows for the creation
of arbitrary sections that can have multiple subsections full of additional
info text. The system allows for dynamic values culled from the target
table by way of info_snippet()
, for getting named text extracts from
queries, and the use of {<snippet_name>}
in the info text. To make the
use of info_snippet()
more convenient for common queries, a set of
snip_*()
functions are provided in the package (snip_list()
,
snip_stats()
, snip_lowest()
, and snip_highest()
) though you are free to
use your own expressions.
Because snippets need to query the target table to return fragments of info
text, the incorporate()
function needs to be used to initiate this action.
This is also necessary for the informant to update other metadata elements
such as row and column counts. Once the incorporation process is complete,
snippets and other metadata will be updated. Calling the informant itself
will result in a reporting table. This reporting can also be accessed with
the get_informant_report()
function, where there are more reporting
options.
Usage
create_informant(
tbl = NULL,
tbl_name = NULL,
label = NULL,
agent = NULL,
lang = NULL,
locale = NULL,
read_fn = NULL
)
Arguments
tbl |
Table or expression for reading in one
The input table. This can be a data frame, a tibble, a |
tbl_name |
A table name
A optional name to assign to the input table object. If no value is provided, a name will be generated based on whatever information is available. |
label |
An optional label for the information report
An optional label for the information report. If no value is provided, a label will be generated based on the current system time. Markdown can be used here to make the label more visually appealing (it will appear in the header area of the information report). |
agent |
The pointblank agent object
A pointblank agent object. The table from this object can be extracted
and used in the new informant instead of supplying a table in |
lang |
Reporting language
The language to use for the information report (a summary table that
provides all of the available information for the table. By default, |
locale |
Locale for value formatting within reports
An optional locale ID to use for formatting values in the information
report according the locale's rules. Examples include |
read_fn |
Deprecated Table reading function
The |
Value
A ptblank_informant
object.
Supported Input Tables
The types of data tables that are officially supported are:
data frames (
data.frame
) and tibbles (tbl_df
)Spark DataFrames (
tbl_spark
)the following database tables (
tbl_dbi
):-
PostgreSQL tables (using the
RPostgres::Postgres()
as driver) -
MySQL tables (with
RMySQL::MySQL()
) -
Microsoft SQL Server tables (via odbc)
-
BigQuery tables (using
bigrquery::bigquery()
) -
DuckDB tables (through
duckdb::duckdb()
) -
SQLite (with
RSQLite::SQLite()
)
-
Other database tables may work to varying degrees but they haven't been formally tested (so be mindful of this when using unsupported backends with pointblank).
YAML
A pointblank informant can be written to YAML with yaml_write()
and the
resulting YAML can be used to regenerate an informant (with
yaml_read_informant()
) or perform the 'incorporate' action using the target
table (via yaml_informant_incorporate()
). Here is an example of how a
complex call of create_informant()
is expressed in R code and in the
corresponding YAML representation.
R statement:
create_informant( tbl = ~ small_table, tbl_name = "small_table", label = "An example.", lang = "fr", locale = "fr_CA" )
YAML representation:
type: informant tbl: ~small_table tbl_name: small_table info_label: An example. lang: fr locale: fr_CA table: name: small_table _columns: 8 _rows: 13.0 _type: tbl_df columns: date_time: _type: POSIXct, POSIXt date: _type: Date a: _type: integer b: _type: character c: _type: numeric d: _type: numeric e: _type: logical f: _type: character
The generated YAML includes some top-level keys where type
and tbl
are
mandatory, and, two metadata sections: table
and columns
. Keys that begin
with an underscore character are those that are updated whenever
incorporate()
is called on an informant. The table
metadata section can
have multiple subsections with info text. The columns
metadata section
can similarly have have multiple subsections, so long as they are children to
each of the column keys (in the above YAML example, date_time
and date
are column keys and they match the table's column names). Additional sections
can be added but they must have key names on the top level that don't
duplicate the default set (i.e., type
, table
, columns
, etc. are treated
as reserved keys).
Writing an Informant to Disk
An informant object can be written to disk with the x_write_disk()
function. Informants are stored in the serialized RDS format and can be
easily retrieved with the x_read_disk()
function.
It's recommended that table-prep formulas are supplied to the tbl
argument
of create_informant()
. In this way, when an informant is read from disk
through x_read_disk()
, it can be reused to access the target table (which
may changed, hence the need to use an expression for this).
Examples
Let's walk through how we can generate some useful information for a really
small table. It's actually called small_table
and we can find it as a
dataset in this package.
small_table #> # A tibble: 13 x 8 #> date_time date a b c d e f #> <dttm> <date> <int> <chr> <dbl> <dbl> <lgl> <chr> #> 1 2016-01-04 11:00:00 2016-01-04 2 1-bcd-345 3 3423. TRUE high #> 2 2016-01-04 00:32:00 2016-01-04 3 5-egh-163 8 10000. TRUE low #> 3 2016-01-05 13:32:00 2016-01-05 6 8-kdg-938 3 2343. TRUE high #> 4 2016-01-06 17:23:00 2016-01-06 2 5-jdo-903 NA 3892. FALSE mid #> 5 2016-01-09 12:36:00 2016-01-09 8 3-ldm-038 7 284. TRUE low #> 6 2016-01-11 06:15:00 2016-01-11 4 2-dhe-923 4 3291. TRUE mid #> 7 2016-01-15 18:46:00 2016-01-15 7 1-knw-093 3 843. TRUE high #> 8 2016-01-17 11:27:00 2016-01-17 4 5-boe-639 2 1036. FALSE low #> 9 2016-01-20 04:30:00 2016-01-20 3 5-bce-642 9 838. FALSE high #> 10 2016-01-20 04:30:00 2016-01-20 3 5-bce-642 9 838. FALSE high #> 11 2016-01-26 20:07:00 2016-01-26 4 2-dmx-010 7 834. TRUE low #> 12 2016-01-28 02:51:00 2016-01-28 2 7-dmx-010 8 108. FALSE low #> 13 2016-01-30 11:23:00 2016-01-30 1 3-dka-303 NA 2230. TRUE high
Create a pointblank informant
object with create_informant()
and the
small_table
dataset.
informant <- create_informant( tbl = pointblank::small_table, tbl_name = "small_table", label = "`create_informant()` example." )
This function creates some information without any extra help by profiling
the supplied table object. It adds the COLUMNS
section with stubs for each
of the target table's columns. We can use the info_columns()
or
info_columns_from_tbl()
to provide descriptions for each of the columns.
The informant
object can be printed to see the information report in the
Viewer.
informant
If we want to make use of more report display options, we can alternatively
use the get_informant_report()
function.
report <- get_informant_report( informant, title = "Data Dictionary for `small_table`" ) report
Function ID
1-3
See Also
Other Planning and Prep:
action_levels()
,
create_agent()
,
db_tbl()
,
draft_validation()
,
file_tbl()
,
scan_data()
,
tbl_get()
,
tbl_source()
,
tbl_store()
,
validate_rmd()