diagonAlley {BeeBDC}R Documentation

Find fill-down errors


A simple function that looks for potential latitude and longitude fill-down errors by identifying consecutive occurrences with coordinates at regular intervals. This is accomplished by using a sliding window with the length determined by minRepeats.


  data = NULL,
  minRepeats = NULL,
  groupingColumns = c("eventDate", "recordedBy", "datasetName"),
  ndec = 3,
  stepSize = 1e+06,
  mc.cores = 1



A data frame or tibble. Occurrence records as input.


Numeric. The minimum number of lat or lon repeats needed to flag a record


Character. The column(s) to group the analysis by and search for fill-down errors within. Default = c("eventDate", "recordedBy", "datasetName").


Numeric. The number of decimal places below which records will not be considered in the diagonAlley function. This is fed into jbd_coordinates_precision(). Default = 3.


Numeric. The number of occurrences to process in each chunk. Default = 1000000.


Numeric. If > 1, the function will run in parallel using mclapply using the number of cores specified. If = 1 then it will be run using a serial loop. NOTE: Windows machines must use a value of 1 (see ?parallel::mclapply). Additionally, be aware that each thread can use large chunks of memory. Default = 1.


The sliding window (and hence fill-down errors) will only be examined within the user-defined groupingColumns; if any of those columns are empty, that record will be excluded.


The function returns the input data with a new column, .sequential, where FALSE = records that have consecutive latitudes or longitudes greater than or equal to the user-defined threshold.


# Read in the example data
 # Run the function
  beesRaw_out <- diagonAlley(
    data = beesRaw,
    # The minimum number of repeats needed to find a sequence in for flagging
    minRepeats = 4,
    groupingColumns = c("eventDate", "recordedBy", "datasetName"),
    ndec = 3,
    stepSize = 1000000,
    mc.cores = 1)

[Package BeeBDC version 1.2.0 Index]