R: Attempt to match database_ids from a prior run

idMatchR {BeeBDC}

R Documentation

Attempt to match database_ids from a prior run

Description

This function attempts to match database_ids from a prior bdc or BeeBDC run in order to keep this column somewhat consistent between iterations. However, not all records contain sufficient information for this to work flawlessly.

Usage

idMatchR(
  currentData = NULL,
  priorData = NULL,
  matchBy = NULL,
  completeness_cols = NULL,
  excludeDataset = NULL
)

Arguments

`currentData`	A data frame or tibble. The NEW occurrence records as input.
`priorData`	A data frame or tibble. The PRIOR occurrence records as input.
`matchBy`	A list of character vectors Should contain the columns to iteratively compare.
`completeness_cols`	A character vector. The columns to check for completeness, arrange, and assign the relevant prior database_id.
`excludeDataset`	A character vector. The dataSources that are to be excluded from data matching. These should be static dataSources from minor providers.

Value

The input data frame returned with an updated database_id column that shows the database_ids as in priorData where they could be matched. Additionally, a columnd called idContinuity is returned where TRUE indicates a match to a prior database_id and FALSE indicates that a new database_id was assigned.

Examples

# Get the example data
data("beesRaw", package = "BeeBDC")
# Which datasets are static and should be excluded from matching?
excludeDataset <- c("BMin", "BMont", "CAES", "EaCO", "Ecd", "EcoS",
                    "Gai", "KP", "EPEL", "USGS", "FSCA", "SMC", "Bal", "Lic", "Arm", "BBD", 
                    "MEPB")
  # Match the data to itself just as an example of running the code.
beesRaw_out <- idMatchR(
  currentData = beesRaw,
  priorData = beesRaw,
  # First matches will be given preference over later ones
  matchBy = dplyr::lst(c("gbifID"),
                        c("catalogNumber", "institutionCode", "dataSource"),
                        c("occurrenceID", "dataSource"),
                        c("recordId", "dataSource"),
                        c("id"),
                        c("catalogNumber", "institutionCode")),
  # You can exclude datasets from prior by matching their prefixs - before first underscore:
  excludeDataset = excludeDataset)

[Package BeeBDC version 1.2.0 Index]