cc_dupl {CoordinateCleaner} | R Documentation |
Identify Duplicated Records
Description
Removes or flags duplicated records based on species name and coordinates, as well as user-defined additional columns. True (specimen) duplicates or duplicates from the same species can make up the bulk of records in a biological collection database, but are undesirable for many analyses. Both can be flagged with this function, the former given enough additional information.
Usage
cc_dupl(
x,
lon = "decimalLongitude",
lat = "decimalLatitude",
species = "species",
additions = NULL,
value = "clean",
verbose = TRUE
)
Arguments
x |
data.frame. Containing geographical coordinates and species names. |
lon |
character string. The column with the longitude coordinates. Default = “decimalLongitude”. |
lat |
character string. The column with the latitude coordinates. Default = “decimalLatitude”. |
species |
a character string. The column with the species name. Default = “species”. |
additions |
a vector of character strings. Additional columns to be included in the test for duplication. For example as below, collector name and collector number. |
value |
character string. Defining the output value. See value. |
verbose |
logical. If TRUE reports the name of the test and the number of records flagged. |
Value
Depending on the ‘value’ argument, either a data.frame
containing the records considered correct by the test (“clean”) or a
logical vector (“flagged”), with TRUE = test passed and FALSE = test
failed/potentially problematic . Default = “clean”.
See Also
Other Coordinates:
cc_aohi()
,
cc_cap()
,
cc_cen()
,
cc_coun()
,
cc_equ()
,
cc_gbif()
,
cc_inst()
,
cc_iucn()
,
cc_outl()
,
cc_sea()
,
cc_urb()
,
cc_val()
,
cc_zero()
Examples
x <- data.frame(species = letters[1:10],
decimalLongitude = sample(x = 0:10, size = 100, replace = TRUE),
decimalLatitude = sample(x = 0:10, size = 100, replace = TRUE),
collector = "Bonpl",
collector.number = c(1001, 354),
collection = rep(c("K", "WAG","FR", "P", "S"), 20))
cc_dupl(x, value = "flagged")
cc_dupl(x, additions = c("collector", "collector.number"))