glskrigeidwcv {spm2} | R Documentation |
Cross validation, n-fold and leave-one-out for the hybrid methods of generalised least squares ('gls'), 'kriging' and inverse distance weighted ('IDW')
Description
This function is a cross validation function for 38 hybrid methods of 'gls', 'kriging' and 'IDW', including the average of 'glskrige' and 'glsidw' ('glskrigeglsidw') and the average of 'gls', 'glskrige' and 'glsidw' ('glsglskrigeglsidw'), where 'kriging' methods include ordinary kriging ('OK'), simple kriging ('SK'), block 'OK' ('BOK') and block 'SK'('BSK') and 'IDW' also covers 'NN' and 'KNN'.. The data splitting is based on a stratified random sampling method (see the 'datasplit' function for details).
Usage
glskrigeidwcv(
model = var1 ~ 1,
longlat,
trainxy,
y,
corr.args = NULL,
weights = NULL,
transformation = "none",
delta = 1,
formula.krige = res1 ~ 1,
vgm.args = c("Sph"),
anis = c(0, 1),
alpha = 0,
block = 0,
beta,
nmaxkrige = 12,
idp = 2,
nmaxidw = 12,
hybrid.parameter = 2,
lambda = 1,
validation = "CV",
cv.fold = 10,
predacc = "VEcv",
...
)
Arguments
model |
a formula defining the response variable and predictive variables. |
longlat |
a dataframe contains longitude and latitude of point samples. |
trainxy |
a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. That is, the location information must be names as 'long' and 'lat'. |
y |
a vector of the response variable in the formula, that is, the left part of the formula. |
corr.args |
arguments for 'correlation' in 'gls'. See '?corClasses' in 'nlme' for details. By default, "NULL" is used. When "NULL" is used, then 'gls' is actually performing 'lm'. |
weights |
describing the within-group heteroscedasticity structure. Defaults to "NULL", corresponding to homoscedastic errors. See '?gls' in 'nlme' for details. |
transformation |
transform the residuals of 'gls' to normalise the data for 'krige'; can be "sqrt" for square root, "arcsine" for arcsine, "log" or "none" for non transformation. By default, "none" is used. |
delta |
numeric; to avoid log(0) in the log transformation. The default is 1. |
formula.krige |
formula defining the response vector and (possible) regressor. an object (i.e., 'variogram.formula') for 'variogram' or a formula for 'krige'. see 'variogram' and 'krige' in 'gstat' for details. |
vgm.args |
arguments for 'vgm', e.g. variogram model of response variable and anisotropy parameters. see 'vgm' in 'gstat' for details. By default, "Sph" is used. |
anis |
anisotropy parameters: see notes 'vgm' in 'gstat' for details. |
alpha |
direction in plane (x,y). see variogram in 'gstat' for details. |
block |
block size. see 'krige' in 'gstat' for details. |
beta |
for simple kriging. see 'krige' in 'gstat' for details. |
nmaxkrige |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
idp |
a numeric number specifying the inverse distance weighting power. |
nmaxidw |
for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used. |
hybrid.parameter |
the default is 2 that is for 'glskrigeglsidw'; for 'glsglskrigeglsidw', it needs to be 3. |
lambda |
ranging from 0 to 2; the default is 1 for 'glskrigeglsidw' and 'glsglskrigeglsidw'; and if it is < 1, more weight is placed on 'krige', otherwise more weight is placed on 'idw'; and if it is 0, 'idw' is not considered and the resultant methods is 'glskrige' when the default 'hybrid.parameter' is used; and if it is 2, then the resultant method is 'glsidw' when the default 'hybrid.parameter' is used. |
validation |
validation methods, include 'LOO': leave-one-out, and 'CV': cross-validation. |
cv.fold |
integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended. |
predacc |
can be either "VEcv" for vecv or "ALL" for all measures in function pred.acc. |
... |
other arguments passed on to 'gls', 'krige' and 'gstat'. |
Value
A list with the following components: me, rme, mae, rmae, mse, rmse, rrmse, vecv and e1; or vecv only.
Note
This function is largely based on rfcv in 'randomForest', 'krigecv' in 'spm2' and 'gls' in 'library(nlme)'.
Author(s)
Jin Li
References
Li, J. (2022). Spatial Predictive Modeling with R. Boca Raton, Chapman and Hall/CRC.
Pinheiro, J. C. and D. M. Bates (2000). Mixed-Effects Models in S and S-PLUS. New York, Springer.
Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.
Examples
library(spm)
library(nlme)
data(petrel)
gravel <- petrel[, c(1, 2, 6:9, 5)]
longlat <- petrel[, c(1, 2)]
range1 <- 0.8
nugget1 <- 0.5
model <- log(gravel + 1) ~ long + lat + bathy + dist + I(long^2) + I(lat^2) +
I(lat^3) + I(bathy^2) + I(bathy^3) + I(dist^2) + I(dist^3) + I(relief^2) + I(relief^3)
glskrigeidwcv1 <- glskrigeidwcv(model = model, longlat = longlat, trainxy = gravel,
y = log(gravel[, 7] +1), transformation = "none", formula.krige = res1 ~ 1,
vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12, validation = "CV",
corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE),
predacc = "ALL")
glskrigeidwcv1
# For glskrigeglsidw
set.seed(1234)
n <- 20 # number of iterations,60 to 100 is recommended.
VEcv <- NULL
for (i in 1:n) {
glskrigeidwcv1 <- glskrigeidwcv(model = model, longlat = longlat, trainxy = gravel,
y = log(gravel[, 7] +1), transformation = "none", formula.krige = res1 ~ 1,
vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12, validation = "CV",
corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE),
predacc = "VEcv")
VEcv [i] <- glskrigeidwcv1
}
plot(VEcv ~ c(1:n), xlab = "Iteration for GLSOKGLSIDW", ylab = "VEcv (%)")
points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2)
abline(h = mean(VEcv), col = 'blue', lwd = 2)
# For glsglskrigeglsidw
set.seed(1234)
n <- 20 # number of iterations,60 to 100 is recommended.
VEcv <- NULL
for (i in 1:n) {
glskrigeidwcv1 <- glskrigeidwcv(model = model, longlat = longlat, trainxy = gravel,
y = log(gravel[, 7] +1), transformation = "none", formula.krige = res1 ~ 1,
vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12, hybrid.parameter = 3,
validation = "CV", corr.args = corSpher(c(range1, nugget1), form = ~ lat + long,
nugget = TRUE), predacc = "VEcv")
VEcv [i] <- glskrigeidwcv1
}
plot(VEcv ~ c(1:n), xlab = "Iteration for GLSOKGLSIDW", ylab = "VEcv (%)")
points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2)
abline(h = mean(VEcv), col = 'blue', lwd = 2)