clean_fossils {CoordinateCleaner}R Documentation

Geographic and Temporal Cleaning of Records from Fossil Collections

Description

Cleaning records by multiple empirical tests to flag potentially erroneous coordinates and time-spans, addressing issues common in fossil collection databases. Individual tests can be activated via the tests argument:

Usage

clean_fossils(
  x,
  lon = "decimalLongitude",
  lat = "decimalLatitude",
  min_age = "min_ma",
  max_age = "max_ma",
  taxon = "accepted_name",
  tests = c("agesequal", "centroids", "equal", "gbif", "institutions", "spatiotemp",
    "temprange", "validity", "zeros"),
  countries = NULL,
  centroids_rad = 0.05,
  centroids_detail = "both",
  inst_rad = 0.001,
  outliers_method = "quantile",
  outliers_threshold = 5,
  outliers_size = 7,
  outliers_replicates = 5,
  zeros_rad = 0.5,
  centroids_ref = NULL,
  country_ref = NULL,
  inst_ref = NULL,
  value = "spatialvalid",
  verbose = TRUE,
  report = FALSE
)

Arguments

x

data.frame. Containing fossil records, containing taxon names, ages, and geographic coordinates..

lon

character string. The column with the longitude coordinates. Default = “decimalLongitude”.

lat

character string. The column with the latitude coordinates. Default = “decimalLatitude”.

min_age

character string. The column with the minimum age. Default = “min_ma”.

max_age

character string. The column with the maximum age. Default = “max_ma”.

taxon

character string. The column with the taxon name. If “”, searches for outliers over the entire dataset, otherwise per specified taxon. Default = “accepted_name”.

tests

vector of character strings, indicating which tests to run. See details for all tests available. Default = c("centroids", "equal", "gbif", "institutions", "temprange", "spatiotemp", "agesequal", "zeros")

countries

a character string. The column with the country assignment of each record in three letter ISO code. Default = “countrycode”. If missing, the countries test is skipped.

centroids_rad

numeric. The radius around centroid coordinates in meters. Default = 1000.

centroids_detail

a character string. If set to ‘country’ only country (adm-0) centroids are tested, if set to ‘provinces’ only province (adm-1) centroids are tested. Default = ‘both’.

inst_rad

numeric. The radius around biodiversity institutions coordinates in metres. Default = 100.

outliers_method

The method used for outlier testing. See details.

outliers_threshold

numerical. The multiplier for the interquantile range for outlier detection. The higher the number, the more conservative the outlier tests. See cf_outl for details. Default = 3.

outliers_size

numerical. The minimum number of records in a dataset to run the taxon-specific outlier test. Default = 7.

outliers_replicates

numeric. The number of replications for the distance matrix calculation. See details. Default = 5.

zeros_rad

numeric. The radius around 0/0 in degrees. Default = 0.5.

centroids_ref

a data.frame with alternative reference data for the centroid test. If NULL, the countryref dataset is used. Alternatives must be identical in structure.

country_ref

a SpatVector as alternative reference for the countries test. If NULL, the rnaturalearth:ne_countries('medium', returnclass = "sf") dataset is used.

inst_ref

a data.frame with alternative reference data for the biodiversity institution test. If NULL, the institutions dataset is used. Alternatives must be identical in structure.

value

a character string defining the output value. See the value section for details. one of ‘spatialvalid’, ‘summary’, ‘clean’. Default = ‘spatialvalid’.

verbose

logical. If TRUE reports the name of the test and the number of records flagged.

report

logical or character. If TRUE a report file is written to the working directory, summarizing the cleaning results. If a character, the path to which the file should be written. Default = FALSE.

Details

Value

Depending on the output argument:

“spatialvalid”

an object of class spatialvalid similar to x with one column added for each test. TRUE = clean coordinate entry, FALSE = potentially problematic coordinate entries. The .summary column is FALSE if any test flagged the respective coordinate.

“flagged”

a logical vector with the same order as the input data summarizing the results of all test. TRUE = clean coordinate, FALSE = potentially problematic (= at least one test failed).

“clean”

a data.frame similar to x with potentially problematic records removed

Note

Always tests for coordinate validity: non-numeric or missing coordinates and coordinates exceeding the global extent (lon/lat, WGS84).

See https://ropensci.github.io/CoordinateCleaner/ for more details and tutorials.

See Also

Other Wrapper functions: clean_coordinates(), clean_dataset()

Examples


minages <- runif(250, 0, 65)
exmpl <- data.frame(accepted_name = sample(letters, size = 250, replace = TRUE),
                    decimalLongitude = runif(250, min = 42, max = 51),
                    decimalLatitude = runif(250, min = -26, max = -11),
                    min_ma = minages,
                    max_ma = minages + runif(250, 0.1, 65))

test <- clean_fossils(x = exmpl)

summary(test)


[Package CoordinateCleaner version 3.0.1 Index]