diagonAlley {BeeBDC} | R Documentation |
Find fill-down errors
Description
A simple function that looks for potential latitude and longitude fill-down errors by identifying consecutive occurrences with coordinates at regular intervals. This is accomplished by using a sliding window with the length determined by minRepeats.
Usage
diagonAlley(
data = NULL,
minRepeats = NULL,
groupingColumns = c("eventDate", "recordedBy", "datasetName"),
ndec = 3,
stepSize = 1e+06,
mc.cores = 1
)
Arguments
data |
A data frame or tibble. Occurrence records as input. |
minRepeats |
Numeric. The minimum number of lat or lon repeats needed to flag a record |
groupingColumns |
Character. The column(s) to group the analysis by and search for fill-down errors within. Default = c("eventDate", "recordedBy", "datasetName"). |
ndec |
Numeric. The number of decimal places below which records will not be considered
in the diagonAlley function. This is fed into |
stepSize |
Numeric. The number of occurrences to process in each chunk. Default = 1000000. |
mc.cores |
Numeric. If > 1, the function will run in parallel using mclapply using the number of cores specified. If = 1 then it will be run using a serial loop. NOTE: Windows machines must use a value of 1 (see ?parallel::mclapply). Additionally, be aware that each thread can use large chunks of memory. Default = 1. |
Details
The sliding window (and hence fill-down errors) will only be examined within the user-defined groupingColumns; if any of those columns are empty, that record will be excluded.
Value
The function returns the input data with a new column, .sequential, where FALSE = records that have consecutive latitudes or longitudes greater than or equal to the user-defined threshold.
Examples
# Read in the example data
data(beesRaw)
# Run the function
beesRaw_out <- diagonAlley(
data = beesRaw,
# The minimum number of repeats needed to find a sequence in for flagging
minRepeats = 4,
groupingColumns = c("eventDate", "recordedBy", "datasetName"),
ndec = 3,
stepSize = 1000000,
mc.cores = 1)