clean_coordinates {CoordinateCleaner}R Documentation

Geographic Cleaning of Coordinates from Biologic Collections


Cleaning geographic coordinates by multiple empirical tests to flag potentially erroneous coordinates, addressing issues common in biological collection databases.


  lon = "decimallongitude",
  lat = "decimallatitude",
  species = "species",
  countries = NULL,
  tests = c("capitals", "centroids", "equal", "gbif", "institutions", "outliers",
    "seas", "zeros"),
  capitals_rad = 10000,
  centroids_rad = 1000,
  centroids_detail = "both",
  inst_rad = 100,
  outliers_method = "quantile",
  outliers_mtp = 5,
  outliers_td = 1000,
  outliers_size = 7,
  range_rad = 0,
  zeros_rad = 0.5,
  capitals_ref = NULL,
  centroids_ref = NULL,
  country_ref = NULL,
  country_refcol = "iso_a3",
  inst_ref = NULL,
  range_ref = NULL,
  seas_ref = NULL,
  seas_scale = 50,
  urban_ref = NULL,
  value = "spatialvalid",
  verbose = TRUE,
  report = FALSE



data.frame. Containing geographical coordinates and species names.


character string. The column with the longitude coordinates. Default = “decimallongitude”.


character string. The column with the latitude coordinates. Default = “decimallatitude”.


a character string. A vector of the same length as rows in x, with the species identity for each record. If NULL, tests must not include the "outliers" or "duplicates" tests.


a character string. The column with the country assignment of each record in three letter ISO code. Default = “countrycode”. If missing, the countries test is skipped.


a vector of character strings, indicating which tests to run. See details for all tests available. Default = c("capitals", "centroids", "equal", "gbif", "institutions", "outliers", "seas", "zeros")


numeric. The radius around capital coordinates in meters. Default = 10000.


numeric. The radius around centroid coordinates in meters. Default = 1000.


a character string. If set to ‘country’ only country (adm-0) centroids are tested, if set to ‘provinces’ only province (adm-1) centroids are tested. Default = ‘both’.


numeric. The radius around biodiversity institutions coordinates in metres. Default = 100.


The method used for outlier testing. See details.


numeric. The multiplier for the interquartile range of the outlier test. If NULL is used. Default = 5.


numeric. The minimum distance of a record to all other records of a species to be identified as outlier, in km. Default = 1000.


numerical. The minimum number of records in a dataset to run the taxon-specific outlier test. Default = 7.


buffer around natural ranges. Default = 0.


numeric. The radius around 0/0 in degrees. Default = 0.5.


a data.frame with alternative reference data for the country capitals test. If missing, the countryref dataset is used. Alternatives must be identical in structure.


a data.frame with alternative reference data for the centroid test. If NULL, the countryref dataset is used. Alternatives must be identical in structure.


a SpatialPolygonsDataFrame as alternative reference for the countries test. If NULL, the rnaturalearth:ne_countries('medium') dataset is used.


the column name in the reference dataset, containing the relevant ISO codes for matching. Default is to "iso_a3_eh" which referes to the ISO-3 codes in the reference dataset. See notes.


a data.frame with alternative reference data for the biodiversity institution test. If NULL, the institutions dataset is used. Alternatives must be identical in structure.


a SpatialPolygonsDataFrame of species natural ranges. Required to include the 'ranges' test. See cc_iucn for details.


a SpatialPolygonsDataFrame as alternative reference for the seas test. If NULL, the rnaturalearth::ne_download(=scale = 110, type = 'land', category = 'physical') dataset is used.


The scale of the default landmass reference. Must be one of 10, 50, 110. Higher numbers equal higher detail. Default = 50.


a SpatialPolygonsDataFrame as alternative reference for the urban test. If NULL, the test is skipped. See details for a reference gazetteers.


a character string defining the output value. See the value section for details. one of ‘spatialvalid’, ‘summary’, ‘clean’. Default = ‘spatialvalid’.


logical. If TRUE reports the name of the test and the number of records flagged.


logical or character. If TRUE a report file is written to the working directory, summarizing the cleaning results. If a character, the path to which the file should be written. Default = FALSE.


The function needs all coordinates to be formally valid according to WGS84. If the data contains invalid coordinates, the function will stop and return a vector flagging the invalid records. TRUE = non-problematic coordinate, FALSE = potentially problematic coordinates.


Depending on the output argument:


an object of class spatialvalid similar to x with one column added for each test. TRUE = clean coordinate entry, FALSE = potentially problematic coordinate entries. The .summary column is FALSE if any test flagged the respective coordinate.


a logical vector with the same order as the input data summarizing the results of all test. TRUE = clean coordinate, FALSE = potentially problematic (= at least one test failed).


a data.frame similar to x with potentially problematic records removed


Always tests for coordinate validity: non-numeric or missing coordinates and coordinates exceeding the global extent (lon/lat, WGS84). See for more details and tutorials.

The country_refcol argument allows to adapt the function to the structure of alternative reference datasets. For instance, for rnaturalearth::ne_countries(scale = "small"), the default will fail, but country_refcol = "iso_a3" will work.

See Also

Other Wrapper functions: clean_dataset(), clean_fossils()


exmpl <- data.frame(species = sample(letters, size = 250, replace = TRUE),
                    decimallongitude = runif(250, min = 42, max = 51),
                    decimallatitude = runif(250, min = -26, max = -11))

test <- clean_coordinates(x = exmpl, 
                          tests = c("equal"))
## Not run: 
#run more tests
test <- clean_coordinates(x = exmpl, 
                          tests = c("capitals", 
                          "gbif", "institutions", 
                          "outliers", "seas", 

## End(Not run)

[Package CoordinateCleaner version 2.0-20 Index]