selectAbsences {fuzzySim} | R Documentation |
Select (spatially biased) absence rows.
Description
This function takes a matrix or data frame containing species presence (1) and absence (0) data, and it selects among the absence rows to stay within a given number or ratio of absences, and/or within and/or beyond a given distance to the presences. Optionally, absences can be selected with higher probability towards the vicinity of presences, to reproduce survey bias.
Usage
selectAbsences(data, sp.cols, coord.cols = NULL, min.dist = NULL,
max.dist = NULL, n = NULL, mult.p = NULL, bias = FALSE, bunch = FALSE,
dist.mat = NULL, seed = NULL, plot = !is.null(coord.cols), df = TRUE,
verbosity = 2)
Arguments
data |
a matrix or data frame containing, at least, one column with the species' presence (1) and absence (0) records, with localities in rows; and (if distance or spatial bias are required) two columns with the spatial coordinates. |
sp.cols |
names or index numbers of the columns containing the species presences (1) and absences (0) in 'data'. |
coord.cols |
names or index numbers of the columns containing the spatial coordinates in 'data' (x and y, or longitude and latitude, in this order). Needed if distance or spatial bias are required. |
min.dist |
(optional) numeric value specifying the minimum distance (in the same units as 'coord.cols') at which selected absences should be from the presences. |
max.dist |
(optional) numeric value specifying the maximum distance (in the same units as 'coord.cols') at which selected absences should be from the presences. |
n |
(optional) integer value specifying the number of absence rows to select. Can also be specified as a ratio – see 'mult.p' below. |
mult.p |
(optional) numeric value specifying how many times the number of presences to use as 'n' (e.g. 10 times as many absences as presences). Ignored if 'n' is not NULL. |
bias |
logical value specifying if the selection of absences should be biased towards the vicinity of presences. Requires specifying 'coord.cols'. The default is FALSE. Can take time (and memory) for large datasets if 'dist.mat' is not provided. |
dist.mat |
optional (but recommendable) argument to pass to |
bunch |
[PENDING IMPLEMENTATION] logical value specifying if the selected absences should concentrate around presences in proportion to their local density, as in Vollering et al. (2019). The default is FALSE. |
seed |
(optional) integer value to pass to |
plot |
logical value specifying whether to plot the result. The default is TRUE if 'coord.cols' are provided. |
df |
logical value specifying whether to return a dataframe with the input 'data' after removal of the non-selected absences. The default is TRUE. If set to FALSE, the output is a logical vector specifying if each row of 'data' was selected or not. |
verbosity |
numeric value indicating the amount of messages to display. Choose 0 for no messages. |
Details
Species occurrence data typically incorporate two probability distributions: the actual probability of the species being present, and the probability of it being recorded if it was present (Merow et al. 2013). Thus, any covariation between recording probability and the predictor variables can bias the predictions of species distribution models (Yackulic et al. 2013).
Methods to correct for this bias include the selection of (pseudo)absences preferably towards the vicinity of presence records, in order to reproduce the survey bias. This function implements this latter strategy, in several (alternative or complementary) ways: 1) selecting absences within and/or outside a given distance from presences; 2) biasing the random selection of absences, making it more likely towards the vicinity of presences (providing the 'prob' argument in sample
with the result of distPres
); or [PENDING IMPLEMENTATION!] 3) bunching up the absences preferably around the areas with higher densities of presences (Vollering et al. 2019).
Value
This function returns the 'data' input after removal of the non-selected absences, or (if df=FALSE) a logical vector specifying if each row of 'data' was selected or not. If plot=TRUE and provided 'coord.cols', it also plots the presences (blue "plus" signs), the selected absences (red "minus" signs) and the excluded absences (orange dots).
Author(s)
A. Marcia Barbosa
References
Vollering J., Halvorsen R., Auestad I. & Rydgren K. (2019) Bunching up the background betters bias in species distribution models. Ecography, 42: 1717-1727
See Also
Examples
data(rotif.env)
head(rotif.env)
names(rotif.env)
table(rotif.env$Burceo)
# select among the absences using different criteria:
burceo_select <- selectAbsences(data = rotif.env, sp.cols = "Burceo",
coord.cols = c("Longitude", "Latitude"), n = 150, seed = 123)
burceo_select <- selectAbsences(data = rotif.env, sp.cols = "Burceo",
coord.cols = c("Longitude", "Latitude"), mult.p = 1.5, seed = 123)
burceo_select <- selectAbsences(data = rotif.env, sp.cols = "Burceo",
coord.cols = c("Longitude", "Latitude"), max.dist = 18)
burceo_select <- selectAbsences(data = rotif.env, sp.cols = "Burceo",
coord.cols = c("Longitude", "Latitude"), max.dist = 18, min.dist = 5,
n = sum(rotif.env$Burceo), bias = TRUE)