R: Wrapper function for manipulation of the EDH dataset

edhw {sdam}

R Documentation

Wrapper function for manipulation of the EDH dataset

Description

A function to obtain variable data and perform transformations on the EDH dataset.

Usage

edhw(x = "EDH", vars, as = c("df", "list"), type = c("long", "wide", "narrow"), 
     split, select, addID, limit, id, na.rm, ldf, province, gender, rp, ...)

Arguments

`x`	a list object name with fragments of the `EDH` dataset (optional)
`vars`	vector of variables of interest from `x`; if `x=NULL`, the entire `EDH` dataset is taken (optional)
`as`	format to return the output; either as a `"list"` or a data frame `"df"` object.
`type`	type format of data frame; either `"long"` or `"wide"` (`"narrow"` not yet implemented)
`split`	divide the data into groups by id? (optional and logical)
`select`	vector with `"people"` variables (optional)
`addID`	add identification to the output? (optional and logical)
`limit`	integer or vector to limit the returned output. Ignored if `id` is specified (optional)
`id`	select only `hd_nr` records (optional, integer or character)
`na.rm`	remove entries with NA data? (optional and logical)
`ldf`	is `x` list of data frames? (optional and logical)
`province`	name or abbreviation of Roman province in `EDH` as in `rp` dataset
`gender`	gender of people in `EDH`: `male` or `female`
`rp`	customized list of Roman provinces as in `rp` dataset
`...`	optional arguments if needed.

Details

This is an interface to extract attribute variables from the EDH dataset attached to this package either as a built-in dataset or as external data. EDH dataset is a built-in data set of Latin epigraphy retrieved from the Epigraphic Database Heidelberg API repository where epigraphs or inscriptions in this dataset are recorded in a list object of 84701 items (until 10-11-2020) with at least one of the following 47 (or more) attribute names in the list:

"ID", "commentary", "fotos", "country", "depth", "diplomatic_text", "edh_geography_uri", "findspot", "findspot_ancient", "findspot_modern", "geography", "height", "id", "language", "last_update", "letter_size", "literature", "material", "military", "modern_region", "not_after", "not_before", "people" (which is a list with: "person_id", "nomen", "cognomen", "praenomen", "name", "gender", "status", "tribus", "origo", "occupation", "age: years", "age: months", "age: days"), "present_location", "province_label", "religion",
"responsible_individual", "social_economic_legal_history", "transcription",
"trismegistos_uri", "type_of_inscription", "type_of_monument", "uri", "width",
"work_status", and "year_of_find".

The input in x, however, can be fragments of the EDH dataset or from the Epigraphic Database Heidelberg API obtained by functions get.edh or get.edhw with the "rjson" format, or transformed data organized, for example, by provinces. When x is explicit, it must be at least a list object with a comparable structure to the EDH dataset. Argument ldf is a flag when the input in x is a created list of data frames that are organised by variables rather than by records as in the EDH dataset. The return of the output is either as a list with list or by default as a data frame with option df.

The extraction from EDH is typically through argument vars in the function, and in case that vars is missing, then it takes all entries in x. Ad hoc arguments are the EDH entries province and gender for entering a Roman province and people's gender in x as a data frame; otherwise, these arguments are ignored. When province is used, it is possible to refer to a customized list of provinces with argument "rp"; otherwise, dataset rp is the default where names and abbreviations are accepted.

By default, this wrapper returns a list object with or without a numerical ‘ID’ identification provided by the addID argument. When the output is a data frame, the ordering of the variables is alphabetically and, if desired, it is possible to remove missing data from the output by activating na.rm and work with complete cases.

Arguments id and limit serve to reduce the returned output either to some Epigraphic Database number or to numbers, which are specified by hd_nr, or else by limiting the amount of the returned output. limit here is like the limit argument of function get.edh, but in this case the offset can be specified as a sequence. While limit is a faster way to get to entries in the EDH dataset, argument id is for referring to precisely one or more hd_nrs in the Epigraphic Database Heidelberg API.

Component "people" is a separated list in the EDH dataset, and it should be considered as a separate case from the rest of the variables. In the case that the output is a data frame, the default output is a ‘long’ type table; that is records can appear in different rows and each variable is assigned into a single column, and with this option is possible to select "people" variables like gender and origin. When choosing people variables with select and a data frame output, then "people" attribute must be in vars.

By setting "wide" in type, it is possible to place the different people from a single entry column by column in the data frame and each record has a single row. Finally, argument split allows for dividing the data in the data frame into groups by ‘id’, which corresponds to the HD number of inscription in the EDH dataset.

Value

A list or a data frame with a long or wide format, depending on the input arguments.

Argument province with no vars returns a list of lists.

Warning

EDH is a built-in dataset in the development and legacy version of the package but, because of its size, re not part of the CRAN distribution. Functions edhw and edhwpd download EDH from another repository in References.

Note

Warning messages are given for the EDH dataset as the input, and when choosing the province argument alone.

Author(s)

Antonio Rivero Ostoic

References

Epigraphic Database Heidelberg – Data Reuse Options, (Online; retrieved on 16 June 2019). URL https://edh-www.adw.uni-heidelberg.de/data

https://edh-www.adw.uni-heidelberg.de/data/api (database retrieved on November 2020)

https://github.com/sdam-au/sdam/tree/master/data

https://github.com/mplex/cedhar/tree/master/pkg/sdam/data

Examples

## Not run: 
# load dataset
data(EDH)

# make a list for three variables in 'EDH' for first 4 entries
edhw(vars=c("type_of_inscription", "not_after", "not_before"), limit=4 )

# as before, but also select 'gender' from 'people'
edhw(vars=c("people", "not_after", "not_before"), select="gender", limit=4 )
## End(Not run)

[Package sdam version 1.1.4 Index]