spatial_nndm_cv {spatialsample} | R Documentation |
Nearest neighbor distance matching (NNDM) cross-validation
Description
NNDM is a variant of leave-one-out cross-validation which assigns each observation to a single assessment fold, and then attempts to remove data from each analysis fold until the nearest neighbor distance distribution between assessment and analysis folds matches the nearest neighbor distance distribution between training data and the locations a model will be used to predict. Proposed by Milà et al. (2022), this method aims to provide accurate estimates of how well models will perform in the locations they will actually be predicting. This method was originally implemented in the CAST package.
Usage
spatial_nndm_cv(
data,
prediction_sites,
...,
autocorrelation_range = NULL,
prediction_sample_size = 1000,
min_analysis_proportion = 0.5
)
Arguments
data |
An object of class |
prediction_sites |
An |
... |
Additional arguments passed to |
autocorrelation_range |
A numeric of length 1 representing the landscape
autocorrelation range ("phi" in the terminology of Milà et al. (2022)). If
|
prediction_sample_size |
A numeric of length 1: the number of points to
sample when |
min_analysis_proportion |
The minimum proportion of |
Details
Note that, as a form of leave-one-out cross-validation, this method can be rather slow for larger data (and fitting models to these resamples will be even slower).
Value
A tibble with classes spatial_nndm_cv
, spatial_rset
, rset
,
tbl_df
, tbl
, and data.frame
. The results include a column for the
data split objects and an identification variable id
.
References
C. Milà, J. Mateu, E. Pebesma, and H. Meyer. 2022. "Nearest Neighbour Distance Matching Leave-One-Out Cross-Validation for map validation." Methods in Ecology and Evolution 2022:13, pp 1304– 1316. doi: 10.1111/2041-210X.13851.
H. Meyer and E. Pebesma. 2022. "Machine learning-based global maps of ecological variables and the challenge of assessing them." Nature Communications 13, pp 2208. doi: 10.1038/s41467-022-29838-9.
Examples
data(ames, package = "modeldata")
ames_sf <- sf::st_as_sf(ames, coords = c("Longitude", "Latitude"), crs = 4326)
# Using a small subset of the data, to make the example run faster:
spatial_nndm_cv(ames_sf[1:100, ], ames_sf[2001:2100, ])