spatial_buffer_vfold_cv {spatialsample} | R Documentation |
V-Fold Cross-Validation with Buffering
Description
V-fold cross-validation (also known as k-fold cross-validation) randomly
splits the data into V groups of roughly equal size (called "folds").
A resample of the analysis data consists of V-1 of the folds while the
assessment set contains the final fold.
These functions extend rsample::vfold_cv()
and rsample::group_vfold_cv()
to also apply an inclusion radius and exclusion buffer to the assessment set,
ensuring that your analysis data is spatially separated from the assessment
set.
In basic V-fold cross-validation (i.e. no repeats), the number of resamples
is equal to V.
Usage
spatial_buffer_vfold_cv(
data,
radius,
buffer,
v = 10,
repeats = 1,
strata = NULL,
breaks = 4,
pool = 0.1,
...
)
spatial_leave_location_out_cv(
data,
group,
v = NULL,
radius = NULL,
buffer = NULL,
...,
repeats = 1
)
Arguments
data |
A data frame. |
radius |
Numeric: points within this distance of the initially-selected
test points will be assigned to the assessment set. If |
buffer |
Numeric: points within this distance of any point in the
test set (after |
v |
The number of partitions for the resampling. Set to |
repeats |
The number of times to repeat the V-fold partitioning. |
strata |
A variable in |
breaks |
A single number giving the number of bins desired to stratify a numeric stratification variable. |
pool |
A proportion of data used to determine if a particular group is too small and should be pooled into another group. We do not recommend decreasing this argument below its default of 0.1 because of the dangers of stratifying groups that are too small. |
... |
These dots are for future extensions and must be empty. |
group |
A variable in data (single character or name) used to create folds. For leave-location-out CV, this should be a variable containing the locations to group observations by, for leave-time-out CV the time blocks to group by, and for leave-location-and-time-out the spatiotemporal blocks to group by. |
Details
When radius
and buffer
are both NULL
, spatial_buffer_vfold_cv
is equivalent to rsample::vfold_cv()
and spatial_leave_location_out_cv
is equivalent to rsample::group_vfold_cv()
.
References
K. Le Rest, D. Pinaud, P. Monestiez, J. Chadoeuf, and C. Bretagnolle. 2014. "Spatial leave-one-out cross-validation for variable selection in the presence of spatial autocorrelation," Global Ecology and Biogeography 23, pp. 811-820, doi: 10.1111/geb.12161.
H. Meyer, C. Reudenbach, T. Hengl, M. Katurji, and T. Nauss. 2018. "Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation," Environmental Modelling & Software 101, pp. 1-9, doi: 10.1016/j.envsoft.2017.12.001.
Examples
data(Smithsonian, package = "modeldata")
Smithsonian_sf <- sf::st_as_sf(
Smithsonian,
coords = c("longitude", "latitude"),
crs = 4326
)
spatial_buffer_vfold_cv(
Smithsonian_sf,
buffer = 500,
radius = NULL
)
data(ames, package = "modeldata")
ames_sf <- sf::st_as_sf(ames, coords = c("Longitude", "Latitude"), crs = 4326)
ames_neighborhoods <- spatial_leave_location_out_cv(ames_sf, Neighborhood)