summary_database {epwshiftr} | R Documentation |
Summary CMIP6 model output file status
Description
summary_database()
scans the directory specified and returns a
data.table()
containing summary information about all the CMIP6
files available against the output file index loaded using
load_cmip6_index()
.
Usage
summary_database(
dir,
by = c("activity", "experiment", "variant", "frequency", "variable", "source",
"resolution"),
mult = c("skip", "latest"),
append = FALSE,
miss = c("keep", "overwrite"),
recursive = FALSE,
update = FALSE,
warning = TRUE
)
Arguments
dir |
A single string indicating the directory where CMIP6 model output NetCDF files are stored. |
by |
The grouping column to summary the database status. Should be a subset of:
|
mult |
Actions when multiple files match a same case in the CMIP6
index. If |
append |
If |
miss |
Actions when matched files in the previous summary do not exist
when running current summary. Only applicable when |
recursive |
If |
update |
If |
warning |
If |
Details
The database here can be any directory that stores the NetCDF files for CMIP6
GCMs. It can be also be the same as get_data_dir()
where epwshiftr stores
the output file index, if you want to save the output file index and output
files in the same place.
summary_database()
uses the tracking_id
, datetime_start
and
datetime_end
global attributes of each NetCDF file to match against the
output file index. So the names of NetCDF files do not necessarily follow the
CMIP6 file name encoding.
summary_database()
will append 5 columns in the CMIP6 output file index:
-
file_path
: the full path of matched NetCDF file for every case.
summary_database()
uses future.apply
underneath to speed up the data processing if applicable. You can use your
preferable future backend to speed up data extraction in parallel. By default,
summary_database()
uses future::sequential
backend, which runs things in
sequential.
Value
A data.table::data.table()
containing corresponding grouping
columns plus:
Column | Type | Description |
datetime_start | POSIXct | Start date and time of simulation |
datetime_end | POSIXct | End date and time of simulation |
file_num | Integer | Total number of file per group |
file_size | Units (Mbytes) | Approximate total size of file |
dl_num | Integer | Total number of file downloaded |
dl_percent | Units (%) | Total percentage of file downloaded |
dl_size | Units (Mbytes) | Total size of file downloaded |
Also 2 extra data.table::data.table()
are attached as attributes:
-
not_found
: Adata.table::data.table()
that contains metadata for those CMIP6 outputs that are listed in current CMIP6 output file index but the existing file paths are not valid now and cannot be found in current database. -
not_matched
: Adata.table::data.table()
that contains metadata for those CMIP6 output files that are found in current database but not listed in current CMIP6 output file index.
For the meaning of grouping columns, see init_cmip6_index()
.
Examples
## Not run:
summary_database()
summary_database(by = "experiment")
## End(Not run)