sperrorest {sperrorest} | R Documentation |
Perform spatial error estimation and variable importance assessment
Description
sperrorest is a flexible interface for multiple types of parallelized spatial and non-spatial cross-validation and bootstrap error estimation and parallelized permutation-based assessment of spatial variable importance.
Usage
sperrorest(
formula,
data,
coords = c("x", "y"),
model_fun,
model_args = list(),
pred_fun = NULL,
pred_args = list(),
smp_fun = partition_cv,
smp_args = list(),
train_fun = NULL,
train_param = NULL,
test_fun = NULL,
test_param = NULL,
err_fun = err_default,
imp_variables = NULL,
imp_permutations = 1000,
imp_sample_from = c("test", "train", "all"),
importance = !is.null(imp_variables),
distance = FALSE,
do_gc = 1,
progress = "all",
benchmark = FALSE,
mode_rep = c("future", "sequential", "loop"),
mode_fold = c("sequential", "future", "loop"),
verbose = 0
)
Arguments
formula |
A formula specifying the variables used by the |
data |
a |
coords |
vector of length 2 defining the variables in |
model_fun |
Function that fits a predictive model, such as |
model_args |
Arguments to be passed to |
pred_fun |
Prediction function for a fitted model object created by
|
pred_args |
(optional) Arguments to |
smp_fun |
A function for sampling training and test sets from |
smp_args |
(optional) Arguments to be passed to |
train_fun |
(optional) A function for resampling or subsampling the training sample in order to achieve, e.g., uniform sample sizes on all training sets, or maintaining a certain ratio of positives and negatives in training sets. E.g. resample_uniform or resample_strat_uniform. |
train_param |
(optional) Arguments to be passed to |
test_fun |
(optional) Like |
test_param |
(optional) Arguments to be passed to |
err_fun |
A function that calculates selected error measures from the
known responses in |
imp_variables |
(optional; used if |
imp_permutations |
(optional; used if |
imp_sample_from |
(default: |
importance |
logical (default: |
distance |
logical (default: |
do_gc |
numeric (default: 1): defines frequency of memory garbage
collection by calling gc; if |
progress |
character (default: |
benchmark |
(optional) logical (default: |
mode_rep , mode_fold |
character (default: |
verbose |
Controls the amount of information printed while processing. Defaults to 0 (no output). |
Details
Custom predict functions passed to pred_fun
, which consist of
multiple child functions, must be defined in one function.
Value
A list (object of class sperrorest) with (up to) six components:
error_rep:
sperrorestreperror
containing predictive performances at the repetition levelerror_fold:
sperroresterror
object containing predictive performances at the fold levelrepresampling: represampling object
importance:
sperrorestimportance
object containing permutation-based variable importances at the fold levelbenchmark:
sperrorestbenchmark
object containing information on the system the code is running on, starting and finishing times, number of available CPU cores and runtime performancepackage_version:
sperrorestpackageversion
object containing information about the sperrorest package version
Parallelization
Running in parallel is supported via package future.
Have a look at vignette("future-1-overview", package = "future")
.
In short: Choose a backend and specify the number of workers, then call
sperrorest()
as usual. Example:
future::plan(future.callr::callr, workers = 2) sperrorest()
Parallelization at the repetition is recommended when using repeated cross-validation. If the 'granularity' of parallelized function calls is too fine, the overall runtime will be very poor since the overhead for passing arguments and handling environments becomes too large. Use fold-level parallelization only when the processing time of individual folds is very large and the number of repetitions is small or equals 1.
Note that nested calls to future
are not possible.
Therefore a sequential sperrorest
call should be used for
hyperparameter tuning in a nested cross-validation.
References
Brenning, A. 2012. Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: the R package 'sperrorest'. 2012 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 23-27 July 2012, p. 5372-5375. https://ieeexplore.ieee.org/document/6352393
Brenning, A. 2005. Spatial prediction models for landslide hazards: review, comparison and evaluation. Natural Hazards and Earth System Sciences, 5(6), 853-862. doi:10.5194/nhess-5-853-2005
Brenning, A., S. Long & P. Fieguth. 2012. Detecting rock glacier flow structures using Gabor filters and IKONOS imagery. Remote Sensing of Environment, 125, 227-237. doi:10.1016/j.rse.2012.07.005
Russ, G. & A. Brenning. 2010a. Data mining in precision agriculture: Management of spatial information. In 13th International Conference on Information Processing and Management of Uncertainty, IPMU 2010; Dortmund; 28 June - 2 July 2010. Lecture Notes in Computer Science, 6178 LNAI: 350-359.
Russ, G. & A. Brenning. 2010b. Spatial variable importance assessment for yield prediction in Precision Agriculture. In Advances in Intelligent Data Analysis IX, Proceedings, 9th International Symposium, IDA 2010, Tucson, AZ, USA, 19-21 May 2010. Lecture Notes in Computer Science, 6065 LNCS: 184-195.
Examples
## ------------------------------------------------------------
## Classification tree example using non-spatial partitioning
## ------------------------------------------------------------
# Muenchow et al. (2012), see ?ecuador
fo <- slides ~ dem + slope + hcurv + vcurv + log.carea + cslope
library(rpart)
mypred_part <- function(object, newdata) predict(object, newdata)[, 2]
ctrl <- rpart.control(cp = 0.005) # show the effects of overfitting
# show the effects of overfitting
fit <- rpart(fo, data = ecuador, control = ctrl)
### Non-spatial cross-validation:
mypred_part <- function(object, newdata) predict(object, newdata)[, 2]
nsp_res <- sperrorest(
data = ecuador, formula = fo,
model_fun = rpart,
model_args = list(control = ctrl),
pred_fun = mypred_part,
progress = TRUE,
smp_fun = partition_cv,
smp_args = list(repetition = 1:2, nfold = 3)
)
summary(nsp_res$error_rep)
summary(nsp_res$error_fold)
summary(nsp_res$represampling)
# plot(nsp_res$represampling, ecuador)
### Spatial cross-validation:
sp_res <- sperrorest(
data = ecuador, formula = fo,
model_fun = rpart,
model_args = list(control = ctrl),
pred_fun = mypred_part,
progress = TRUE,
smp_fun = partition_kmeans,
smp_args = list(repetition = 1:2, nfold = 3)
)
summary(sp_res$error_rep)
summary(sp_res$error_fold)
summary(sp_res$represampling)
# plot(sp_res$represampling, ecuador)
smry <- data.frame(
nonspat_training = unlist(summary(nsp_res$error_rep,
level = 1
)$train_auroc),
nonspat_test = unlist(summary(nsp_res$error_rep,
level = 1
)$test_auroc),
spatial_training = unlist(summary(sp_res$error_rep,
level = 1
)$train_auroc),
spatial_test = unlist(summary(sp_res$error_rep,
level = 1
)$test_auroc)
)
boxplot(smry,
col = c("red", "red", "red", "green"),
main = "Training vs. test, nonspatial vs. spatial",
ylab = "Area under the ROC curve"
)