cv.rfsi {meteo} | R Documentation |
Nested k-fold cross-validation for Random Forest Spatial Interpolation (RFSI)
Description
Function for nested k-fold cross-validation function for Random Forest Spatial Interpolation (RFSI) (Sekulić et al. 2020). It is based on rfsi, pred.rfsi, and tune.rfsi functions. Currently, only spatial (leave-location-out) cross-validation is implemented. Temporal and spatio-temporal cross-validation will be implemented in the future.
Usage
cv.rfsi(formula,
data,
data.staid.x.y.z = NULL,
use.idw = FALSE,
s.crs = NA,
p.crs = NA,
tgrid,
tgrid.n=10,
tune.type = "LLO",
k = 5,
seed=42,
out.folds,
in.folds,
acc.metric,
output.format = "data.frame",
cpus = detectCores()-1,
progress = 1,
soil3d = FALSE,
no.obs = 'increase',
...)
Arguments
formula |
formula; Formula for specifying target variable and covariates (without nearest observations and distances to them). If |
data |
sf-class, sftime-class, SpatVector-class or data.frame; Contains target variable (observations) and covariates used for making an RFSI model. If data.frame object, it should have next columns: station ID (staid), longitude (x), latitude (y), 3rd component - time, depth, ... (z) of the observation, observation value (obs) and covariates (cov1, cov2, ...). If covariates are missing, the RFSI model using only nearest obsevrations and distances to them as covariates ( |
data.staid.x.y.z |
numeric or character vector; Positions or names of the station ID (staid), longitude (x), latitude (y) and 3rd component (z) columns in data.frame object (e.g. c(1,2,3,4)). If |
use.idw |
boolean; IDW prediction as covariate - will IDW predictions from |
s.crs |
st_crs or crs; Source CRS of |
p.crs |
st_crs or crs; Projection CRS for |
tgrid |
data.frame; Possible tuning parameters for nested folds. The column names are same as the tuning parameters. Possible tuning parameters are: |
tgrid.n |
numeric; Number of randomly chosen |
tune.type |
character; Type of nested cross-validation: leave-location-out ("LLO"), leave-time-out ("LTO") - TO DO, and leave-location-time-out ("LLTO") - TO DO. Default is "LLO". |
k |
numeric; Number of random outer and inner folds (i.e. for cross-validation and nested tuning) that will be created with CreateSpacetimeFolds function. Default is 5. |
seed |
numeric; Random seed that will be used to generate outer and inner folds with CreateSpacetimeFolds function. |
out.folds |
numeric or character vector or value; Showing outer folds column (if value) or rows (vector) of |
in.folds |
numeric or character vector or value; Showing innner folds column (if value) or rows (vector) of |
acc.metric |
character; Accuracy metric that will be used as a criteria for choosing an optimal RFSI model in nested tuning. Possible values for regression: "ME", "MAE", "NMAE", "RMSE" (default), "NRMSE", "R2", "CCC". Possible values for classification: "Accuracy","Kappa" (default), "AccuracyLower", "AccuracyUpper", "AccuracyNull", "AccuracyPValue", "McnemarPValue". |
output.format |
character; Format of the output, data.frame (default), sf-class, sftime-class, or SpatVector-class. |
cpus |
numeric; Number of processing units. Default is detectCores()-1. |
progress |
numeric; If progress bar is shown. 0 is no progress bar, 1 is outer folds results, 2 is + innner folds results, 3 is + prediction progress bar. Default is 1. |
soil3d |
logical; If 3D soil modellig is performed and near.obs.soil function is used for finding n nearest observations and distances to them. In this case, z position of the |
no.obs |
character; Possible values are |
... |
Further arguments passed to ranger. |
Value
A data.frame, sf-class, sftime-class, or SpatVector-class object (depends on output.format
argument), with columns:
obs |
Observations. |
pred |
Predictions from cross-validation. |
folds |
Folds used for cross-validation. |
Author(s)
Aleksandar Sekulic asekulic@grf.bg.ac.rs
References
Sekulić, A., Kilibarda, M., Heuvelink, G. B., Nikolić, M. & Bajat, B. Random Forest Spatial Interpolation. Remote. Sens. 12, 1687, https://doi.org/10.3390/rs12101687 (2020).
See Also
near.obs
rfsi
pred.rfsi
tune.rfsi
Examples
library(CAST)
library(doParallel)
library(ranger)
library(sp)
library(sf)
library(terra)
library(meteo)
# prepare data
demo(meuse, echo=FALSE)
meuse <- meuse[complete.cases(meuse@data),]
data = st_as_sf(meuse, coords = c("x", "y"), crs = 28992, agr = "constant")
fm.RFSI <- as.formula("zinc ~ dist + soil + ffreq")
# making tgrid
n.obs <- 1:6
min.node.size <- 2:10
sample.fraction <- seq(1, 0.632, -0.05) # 0.632 without / 1 with replacement
splitrule <- "variance"
ntree <- 250 # 500
mtry <- 3:(2+2*max(n.obs))
tgrid = expand.grid(min.node.size=min.node.size, num.trees=ntree,
mtry=mtry, n.obs=n.obs, sample.fraction=sample.fraction)
# Cross-validation of RFSI
rfsi_cv <- cv.rfsi(formula=fm.RFSI, # without nearest obs
data = data,
tgrid = tgrid, # combinations for tuning
tgrid.n = 2, # number of randomly selected combinations from tgrid for tuning
tune.type = "LLO", # Leave-Location-Out CV
k = 5, # number of folds
seed = 42,
acc.metric = "RMSE", # R2, CCC, MAE
output.format = "sf", # "data.frame", # "SpatVector",
cpus=2, # detectCores()-1,
progress=1,
importance = "impurity") # ranger parameter
summary(rfsi_cv)
rfsi_cv$dif <- rfsi_cv$obs - rfsi_cv$pred
plot(rfsi_cv["dif"])
plot(rfsi_cv[, , "obs"])
acc.metric.fun(rfsi_cv$obs, rfsi_cv$pred, "R2")
acc.metric.fun(rfsi_cv$obs, rfsi_cv$pred, "RMSE")
acc.metric.fun(rfsi_cv$obs, rfsi_cv$pred, "MAE")
acc.metric.fun(rfsi_cv$obs, rfsi_cv$pred, "CCC")