jbd_Ctrans_chunker {BeeBDC} | R Documentation |
Wraps jbd_coordinates_transposed to identify and fix transposed occurrences
Description
Because the jbd_coordinates_transposed()
function is very RAM-intensive, this wrapper
allows a user to specify chunk-sizes and only analyse a small portion of the occurrence data at a
time. The prefix jbd_ is used to highlight the difference between this function and the original
bdc::bdc_coordinates_transposed()
.
This function will preferably use the countryCode column generated by
bdc::bdc_country_standardized()
.
Usage
jbd_Ctrans_chunker(
data = NULL,
lat = "decimalLatitude",
lon = "decimalLongitude",
idcol = "databse_id",
country = "country_suggested",
countryCode = "countryCode",
sci_names = "scientificName",
border_buffer = 0.2,
save_outputs = TRUE,
stepSize = 1e+06,
chunkStart = 1,
progressiveSave = TRUE,
path = tempdir(),
append = TRUE,
scale = "large",
mc.cores = 1
)
Arguments
data |
A data frame or tibble. Occurrence records as input. |
lat |
Character. The column with latitude in decimal degrees. Default = "decimalLatitude". |
lon |
Character. The column with longitude in decimal degrees. Default = "decimalLongitude". |
idcol |
Character. The column name with a unique record identifier. Default = "database_id". |
country |
Character. The name of the column containing country names. Default = "country". |
countryCode |
Character. Identifies the column containing ISO-2 country codes Default = "countryCode". |
sci_names |
Character. The column containing scientific names. Default = "scientificName". |
border_buffer |
Numeric. The buffer, in decimal degrees, around points to help match them to countries. Default = 0.2 (~22 km at equator). |
save_outputs |
Logical. If TRUE, transposed occurrences will be saved to their own file. |
stepSize |
Numeric. The number of occurrences to process in each chunk. Default = 1000000. |
chunkStart |
Numeric. The chunk number to start from. This can be > 1 when you need to restart the function from a certain chunk; for example if R failed unexpectedly. |
progressiveSave |
Logical. If TRUE then the country output list will be saved between
each iteration so that |
path |
Character. The path to a file in which to save the 01_coordinates_transposed_ output. |
append |
Logical. If TRUE, the function will look to append an existing file. |
scale |
Passed to rnaturalearth's ne_countries(). Scale of map to return, one of 110, 50, 10 or 'small', 'medium', 'large'. Default = "large". |
mc.cores |
Numeric. If > 1, the jbd_correct_coordinates function will run in parallel using mclapply using the number of cores specified. If = 1 then it will be run using a serial loop. NOTE: Windows machines must use a value of 1 (see ?parallel::mclapply). Additionally, be aware that each thread can use large chunks of memory. Default = 1.#' |
Value
Returns the input data frame with a new column, coordinates_transposed, where FALSE = columns that had coordinates transposed.
Examples
if(requireNamespace("rnaturalearthdata")){
library(dplyr)
# Import and prepare the data
data(beesFlagged)
beesFlagged <- beesFlagged %>% dplyr::select(!c(.val, .sea)) %>%
# Cut down the dataset to un example quicker
dplyr::filter(dplyr::row_number() %in% 1:20)
# Run the function
beesFlagged_out <- jbd_Ctrans_chunker(
# bdc_coordinates_transposed inputs
data = beesFlagged,
idcol = "database_id",
lat = "decimalLatitude",
lon = "decimalLongitude",
country = "country_suggested",
countryCode = "countryCode",
# in decimal degrees (~22 km at the equator)
border_buffer = 1,
save_outputs = FALSE,
sci_names = "scientificName",
# chunker inputs
# How many rows to process at a time
stepSize = 1000000,
# Start row
chunkStart = 1,
# Progressively save the output between each iteration?
progressiveSave = FALSE,
path = tempdir(),
# If FALSE it may overwrite existing dataset
append = FALSE,
# Users should select scale = "large" as it is more thoroughly tested
scale = "medium",
mc.cores = 1
)
table(beesFlagged_out$coordinates_transposed, useNA = "always")
} # END if require