dspl {googlePublicData} | R Documentation |
Builds Dataset Publication Language (DSPL) metadata file
Description
Parsing csv, tab or xls(x) files at a specific directory path, dspl generates a complete DSPL file. If an output string is specified, the function generates the complete ZIP (DSPL file plus csv files) ready to be uploaded to Google Public Data Explorer.
Usage
dspl(path, output = NA, replace = F, targetNamespace = "",
timeFormat = "yyyy", lang = c("es", "en"), name = NA,
description = NA, url = NA, providerName = NA, providerURL = NA,
sep = ";", dec = ".", encoding = getOption("encoding"),
moreinfo = NULL)
new_dspl(path, output = NA, replace = F, targetNamespace = "",
timeFormat = "yyyy", lang = c("es", "en"), name = NA,
description = NA, url = NA, providerName = NA, providerURL = NA,
sep = ";", dec = ".", encoding = getOption("encoding"),
moreinfo = NULL)
Arguments
path |
String. Path to the folder where the tables (csv|tab|xls) are at. |
output |
String, optional. Path to the output ZIP file. |
replace |
Logical. If |
targetNamespace |
String. As DSPL documentation states “Provides a URI that identifies your dataset. This URI is not required to point to an actual resource, but it's a good idea to have the URI resolve to a document describing your content or dataset”. |
timeFormat |
String. The corresponding time format of the collection. Should be specified accordingly to joda-time format. See the Details section for more information. |
lang |
A list of strings of the languages supported by the dataset. Could be only one. |
name |
List of strings. The name of the dataset as defined accordingly
to the |
description |
List of strings. Description of the dataset. It also
supports multiple description as the |
url |
The corresponding URL for the dataset. |
providerName |
List of strings. The data provider name. |
providerURL |
List of strings. The data provider website url. |
sep |
The separation character of the tables in the 'path' folder. Currently supports introducing the following arguments: “,” or “;” (for .csv files), “\t” (for .tab files) and “xls” or “xlsx” (for Microsoft's excel files). |
dec |
String. Decimal point. |
encoding |
The char encoding of the input tables. Currently ignored for Microsoft excel files. |
moreinfo |
A special tab file generated by the function
|
Details
If there isn't any output defined the function returns a list of class
dspl
that among its contents has a xml object (DSPL file); otherwise,
if an output is defined, the results consists on two things, an already ZIP
file containing a all the necessary to be uploaded at
publicdata.google.com (a collection of csv files and the XML DSPL
written file) and a message (character object).
Internally, the parsing process consists on the following steps:
Loading the data,
Generating each column corresponding id,
Identifying the data types,
Building concepts,
Identifying dimensional concepts and distinguishing between categorical, geographical and time dimensions, and
Executing internal checks.
In order to properly load the zip file (DSPL file plus CSV data files), the function executes a series of internal checks upon the data structure. The detailed list:
-
Slices with the same dimensions: DSPL requires that each slice represents one dimensional cut, this is, there should not be more than one data table with the same dimensions.
-
Duplicated concepts: As a result of multiple data types, e.g a single concept (statistic) as integer in one table and float in other,
dspl
may get confused, so during the parsing process, if there is a chance, it collapses duplicated concepts into only one concept and assigns it the common data type (float). -
Correct time format definition: Using
checkTimeFormat
ensures that the time format specified is compatible with DSPL.
Value
If there isn't any output
defined, dspl
returns list
of class
"dspl
".
An object of class "dspl
" is a list containing:
dspl |
A character string containing the DSPL XML document as defined
by the |
concepts.by.table |
A data frame object of concepts stored by table. |
dimtabs |
A data frame containing dimensional tables. |
slices |
A data frame of slices. |
concepts |
A data frame of concepts (all of them). |
dimensions |
A data frame of dimensional concepts. |
statistics |
A matrix of statistics. |
otherwise the function will build a ZIP file as specified in the output containing the CSV and DSPL (XML) files.
Author(s)
George G. Vega Yon
References
Google Public Data Explorer Tutorial: https://developers.google.com/public-data/docs/tutorial
Examples
demo(dspl)