catalog {fetch} | R Documentation |
Create a data source catalog
Description
The catalog
function returns a data catalog
for a data source. A data catalog is like a collection of data dictionaries
for all the datasets in the data source. The catalog allows you to
examine the datasets in the data source without yet loading anything
into memory. Once you decide which data items you want to load,
use the fetch
function to load that item into memory.
Usage
catalog(source, engine, pattern = NULL, where = NULL, import_specs = NULL)
Arguments
source |
The source for the data. This parameter is required. Normally the source is passed as a full or relative path. |
engine |
The data engine to use for this data source. This parameter
is required. The available data engines are available on the |
pattern |
A pattern to use when loading data items from the data source.
The pattern can be a name or a vector of names. Names also accept wildcards.
The supplied pattern will be used to filter which data items are loaded into
the catalog. For example, the pattern |
where |
A where expression to use when fetching
the data. This expression will apply to all fetch operations on this catalog.
The where expression should be defined with the Base R |
import_specs |
The import specs to use for any fetch operation on
this catalog. The import spec can be used to control the data types
on the incoming columns. You can create separate import specs for each
dataset, or one import spec to use for all datasets.
See the |
Value
The loaded data catalog, as class "dcat". The catalog will be a list of data dictionaries. Each data dictionary is a tibble.
See Also
The fetch
function to retrieve data from the catalog,
and the import_spec
function to create import specifications.
Examples
# Get data directory
pkg <- system.file("extdata", package = "fetch")
# Create catalog
ct <- catalog(pkg, engines$csv)
# Example 1: Catalog all rows
# View catalog
ct
# data catalog: 6 items
# - Source: C:/packages/fetch/inst/extdata
# - Engine: csv
# - Items:
# data item 'ADAE': 56 cols 150 rows
# data item 'ADEX': 17 cols 348 rows
# data item 'ADPR': 37 cols 552 rows
# data item 'ADPSGA': 42 cols 695 rows
# data item 'ADSL': 56 cols 87 rows
# data item 'ADVS': 37 cols 3617 rows
# View catalog item
ct$ADEX
# data item 'ADEX': 17 cols 348 rows
# - Engine: csv
# - Size: 70.7 Kb
# - Last Modified: 2020-09-18 14:30:22
# Name Column Class Label Format NAs MaxChar
# 1 ADEX STUDYID character <NA> NA 0 3
# 2 ADEX USUBJID character <NA> NA 0 10
# 3 ADEX SUBJID character <NA> NA 0 3
# 4 ADEX SITEID character <NA> NA 0 2
# 5 ADEX TRTP character <NA> NA 8 5
# 6 ADEX TRTPN numeric <NA> NA 8 1
# 7 ADEX TRTA character <NA> NA 8 5
# 8 ADEX TRTAN numeric <NA> NA 8 1
# 9 ADEX RANDFL character <NA> NA 0 1
# 10 ADEX SAFFL character <NA> NA 0 1
# 11 ADEX MITTFL character <NA> NA 0 1
# 12 ADEX PPROTFL character <NA> NA 0 1
# 13 ADEX PARAM character <NA> NA 0 45
# 14 ADEX PARAMCD character <NA> NA 0 8
# 15 ADEX PARAMN numeric <NA> NA 0 1
# 16 ADEX AVAL numeric <NA> NA 16 4
# 17 ADEX AVALCAT1 character <NA> NA 87 10
# Example 2: Catalog with where expression
ct <- catalog(pkg, engines$csv, where = expression(SUBJID == '049'))
# View catalog item - Now only 4 rows
ct$ADEX
# data item 'ADEX': 17 cols 4 rows
#- Where: SUBJID == "049"
#- Engine: csv
#- Size: 4.5 Kb
#- Last Modified: 2020-09-18 14:30:22
#Name Column Class Label Format NAs MaxChar
#1 ADEX STUDYID character <NA> NA 0 3
#2 ADEX USUBJID character <NA> NA 0 10
#3 ADEX SUBJID character <NA> NA 0 3
#4 ADEX SITEID character <NA> NA 0 2
#5 ADEX TRTP character <NA> NA 0 5
#6 ADEX TRTPN numeric <NA> NA 0 1
#7 ADEX TRTA character <NA> NA 0 5
#8 ADEX TRTAN numeric <NA> NA 0 1
#9 ADEX RANDFL character <NA> NA 0 1
#10 ADEX SAFFL character <NA> NA 0 1
#11 ADEX MITTFL character <NA> NA 0 1
#12 ADEX PPROTFL character <NA> NA 0 1
#13 ADEX PARAM character <NA> NA 0 45
#14 ADEX PARAMCD character <NA> NA 0 8
#15 ADEX PARAMN numeric <NA> NA 0 1
#16 ADEX AVAL numeric <NA> NA 0 4
#17 ADEX AVALCAT1 character <NA> NA 1 10