bdc_standardize_datasets {bdc} | R Documentation |
Standardize datasets columns based on metadata
Description
This function's main goal is to merge and standardize different datasets into a new dataset with column names following the Darwin Core terminology. All the process is based on a metadata file provided by the user.
Usage
bdc_standardize_datasets(
metadata,
format = "csv",
overwrite = FALSE,
save_database = FALSE
)
Arguments
metadata |
A data frame with metadata containing information about the name, path, and columns of the original data set which need to be renamed. See @details. |
format |
a character setting the output file type. Option available are "csv" and "qs" (recommenced to save large datasets). Default == "csv". |
overwrite |
A logical vector indicating whether the final merged dataset should be overwritten. The default is FALSE. |
save_database |
logical. Should the standardized database be locally saved? Default = FALSE. |
Details
bdc_standardize_datasets()
facilitate the standardization of datasets with
different column names by converting them into a new dataset following the
Darwin Core terminology. The standardization process relies on a metadata
file containing the name, path, and columns that need to be renamed. The
metadata file can be constructed using built-in functions (e.g.,
data.frame()
) or storing the information in a CSV file and importing it
into R. Regardless of the method chosen, the data frame with metadata needs
to contain the following column names (this is a list of required column
names; for a comprehensive list of column names following Darwin Core
terminology, see
here
-
datasetName
: A short name identifying the dataset (e.g., GBIF) -
fileName
: The relative path containing the name of the input dataset (e.g., Input_files/GBIF.csv) -
scientificName
: Name of the column in the original database presenting the taxon scientific names with or without authorship information, depending on the format of the source dataset (e.g., Myrcia acuminata) -
decimalLatitude
: Name of the column in the original database presenting the geographic latitude in decimal degrees (e.g., -6.370833) -
decimalLongitude
: Name of the column in the original database presenting the geographic longitude in decimal degrees (e.g., -3.25500)
Value
A merged data.frame with column names following Darwin Core terminology.
Examples
## Not run:
metadata <- readr::read_csv(system.file("extdata/Config/DatabaseInfo.csv",
package = "bdc"))
db_standardized <-
bdc_standardize_datasets(
metadata = metadata,
format = "csv",
overwrite = TRUE,
save_database = FALSE)
## End(Not run)