cv.strk {meteo} | R Documentation |
k-fold cross-validation for spatio-temporal regression kriging
Description
k-fold cross-validation function for spatio-temporal regression kriging based on pred.strk. Currently, only spatial (leave-location-out) cross-validation is implemented. Temporal and spatio-temporal cross-validation will be implemented in the future.
Usage
cv.strk(data,
obs.col=1,
data.staid.x.y.z = NULL,
crs = NA,
zero.tol=0,
reg.coef,
vgm.model,
sp.nmax=20,
time.nmax=2,
type = "LLO",
k = 5,
seed = 42,
folds,
refit = TRUE,
output.format = "STFDF",
parallel.processing = FALSE,
pp.type = "snowfall",
cpus=detectCores()-1,
progress=TRUE,
...)
Arguments
data |
STFDF-class, STSDF-class, STIDF-class, sf-class, sftime-class, SpatVector-class or data.frame; Contains target variable (observations) and covariates in space and time used to perform STRK cross validation. If data.frame object, it should have next columns: station ID (staid), longitude (x), latitude (y), 3rd component - time, depth, ... (z) of the observation, observation value (obs), and covariates (cov1, cov2, ...). Covariate names should be the same as in the |
obs.col |
numeric or character; Column name or number showing position of the observation column in the |
data.staid.x.y.z |
numeric or character vector; Positions or names of the station ID (staid), longitude (x), latitude (y) and 3rd component - time, depth (z) columns in data.frame object (e.g. c(1,2,3,4)). If |
crs |
st_crs or crs; Source CRS of |
zero.tol |
numeric; A distance value below (or equal to) which locations are considered as duplicates. Default is 0. See rm.dupl. Duplicates are removed to avoid singular covariance matrices in kriging. |
reg.coef |
numeric; Vector of named linear regression coefficients. Names of the coefficients (e.g. "Intercept", "temp_geo", "modis", "dem", "twi") will be used to match appropriate covariates from |
vgm.model |
StVariogramModel list; Spatio-temporal variogram of regression residuals (or observations if spatio-temporal ordinary kriging). See vgmST. Spatio-temporal variogram model on residuals for metorological variables (temperature, precipitation, etc.) can be taken from data(tvgms) or can be specified by the user as a vgmST object. |
sp.nmax |
numeric; A number of spatially nearest observations that should be used for kriging predictions. If |
time.nmax |
numeric; A number of temporally nearest observations that should be used for kriging predictions Deafult is 2. |
type |
character; Type of cross-validation: leave-location-out ("LLO"), leave-time-out ("LTO"), and leave-location-time-out ("LLTO"). Default is "LLO". "LTO" and "LLTO" are not implemented yet. Will be in the future. |
k |
numeric; Number of random folds that will be created with CreateSpacetimeFolds function. Default is 5. |
seed |
numeric; Random seed that will be used to generate outer and inner folds with CreateSpacetimeFolds function. |
folds |
numeric or character vector or value; Showing folds column (if value) or rows (vector) of |
refit |
logical; If refit of linear regression trend and spatio-teporal variogram should be performed. Spatio-teporal variogram is fit using |
output.format |
character; Format of the output, STFDF-class (default), STSDF-class, STIDF-class, data.frame, sf-class, sftime-class, or SpatVector-class. |
parallel.processing |
logical; If parallel processing is performed. Default is FALSE. |
pp.type |
character; Type (R package) of parallel processing, "snowfall" (default) or "doParallel". |
cpus |
numeric; Number of processing units. Default is detectCores()-1. |
progress |
logical; If progress bar is shown. Default is TRUE. |
... |
Value
A STFDF-class (default), STSDF-class, STIDF-class, data.frame, sf-class, sftime-class, or SpatVector-class object (depends on output.format
argument), with columns:
obs |
Observations. |
pred |
Predictions from cross-validation. |
folds |
Folds used for cross-validation. |
For accuracy metrics see acc.metric.fun function.
Author(s)
Aleksandar Sekulic asekulic@grf.bg.ac.rs, Milan Kilibarda kili@grf.bg.ac.rs
References
Kilibarda, M., T. Hengl, G. B. M. Heuvelink, B. Graeler, E. Pebesma, M. Percec Tadic, and B. Bajat (2014), Spatio-temporal interpolation of daily temperatures for global land areas at 1 km resolution, J. Geophys. Res. Atmos., 119, 2294-2313, doi:10.1002/2013JD020803.
See Also
acc.metric.fun
pred.strk
tregcoef
tvgms
regdata
meteo2STFDF
tgeom2STFDF
Examples
library(sp)
library(spacetime)
library(gstat)
library(plyr)
library(CAST)
library(doParallel)
library(ranger)
# preparing data
data(dtempc)
data(stations)
data(regdata) # covariates, made by mete2STFDF function
regdata@sp@proj4string <- CRS('+proj=longlat +datum=WGS84')
data(tvgms) # ST variogram models
data(tregcoef) # MLR coefficients
lonmin=18 ;lonmax=22.5 ; latmin=40 ;latmax=46
serbia = point.in.polygon(stations$lon, stations$lat, c(lonmin,lonmax,lonmax,lonmin),
c(latmin,latmin,latmax,latmax))
st = stations[ serbia!=0, ] # stations in Serbia approx.
crs = CRS('+proj=longlat +datum=WGS84')
# create STFDF
stfdf <- meteo2STFDF(obs = dtempc,
stations = st,
crs = crs)
# Cross-validation for mean temperature for days "2011-07-05" and "2011-07-06"
# global model is used for regression and variogram
# Overlay observations with covariates
time <- index(stfdf@time)
covariates.df <- as.data.frame(regdata)
names_covar <- names(tregcoef[[1]])[-1]
for (covar in names_covar){
nrowsp <- length(stfdf@sp)
regdata@sp=as(regdata@sp,'SpatialPixelsDataFrame')
ov <- sapply(time, function(i)
if (covar %in% names(regdata@data)) {
if (as.Date(i) %in% as.Date(index(regdata@time))) {
over(stfdf@sp, as(regdata[, i, covar], 'SpatialPixelsDataFrame'))[, covar]
} else {
rep(NA, length(stfdf@sp))
}
} else {
over(stfdf@sp, as(regdata@sp[covar], 'SpatialPixelsDataFrame'))[, covar]
}
)
ov <- as.vector(ov)
if (all(is.na(ov))) {
stop(paste('There is no overlay of data with covariates!', sep = ""))
}
stfdf@data[covar] <- ov
}
# Remove stations out of covariates
for (covar in names_covar){
# count NAs per stations
numNA <- apply(matrix(stfdf@data[,covar],
nrow=nrowsp,byrow= FALSE), MARGIN=1,
FUN=function(x) sum(is.na(x)))
rem <- numNA != length(time)
stfdf <- stfdf[rem,drop= FALSE]
}
# Remove dates out of covariates
rm.days <- c()
for (t in 1:length(time)) {
if(sum(complete.cases(stfdf[, t]@data)) == 0) {
rm.days <- c(rm.days, t)
}
}
if(!is.null(rm.days)){
stfdf <- stfdf[,-rm.days]
}
### Example with STFDF and without parallel processing and without refitting of variogram
results <- cv.strk(data = stfdf,
obs.col = 1, # "tempc"
data.staid.x.y.z = c(1,NA,NA,NA),
reg.coef = tregcoef[[1]],
vgm.model = tvgms[[1]],
sp.nmax = 20,
time.nmax = 2,
type = "LLO",
k = 5,
seed = 42,
refit = FALSE,
progress = TRUE
)
stplot(results[,,"pred"])
summary(results)
# accuracy
acc.metric.fun(results@data$obs, results@data$pred, "R2")
acc.metric.fun(results@data$obs, results@data$pred, "RMSE")
acc.metric.fun(results@data$obs, results@data$pred, "MAE")
acc.metric.fun(results@data$obs, results@data$pred, "CCC")