glmkrigeidwcv {spm2}R Documentation

Cross validation, n-fold and leave-one-out for the hybrid methods of generalised linear models ('glm'), 'kriging' and inverse distance weighted ('IDW').

Description

This function is a cross validation function for 38 hybrid methods of 'glm', 'kriging' and 'IDW', including the average of 'glmkrige' and 'glmidw' ('glmkrigeglmidw') and the average of 'glm', 'glmkrige' and 'glmidw' ('glmglmkrigeglmidw'), where 'kriging' methods include ordinary kriging ('OK'), simple kriging ('SK'), block 'OK' ('BOK') and block 'SK'('BSK') and 'IDW' also covers 'NN' and 'KNN' (for details, see reference #1). This function can also be sued for 38 hybrid methods of 'lm', 'kriging' and 'IDW'.

Usage

glmkrigeidwcv(
  formula.glm = NULL,
  longlat,
  trainxy,
  y,
  family = "gaussian",
  transformation = "none",
  delta = 1,
  formula.krige = res1 ~ 1,
  vgm.args = c("Sph"),
  anis = c(0, 1),
  alpha = 0,
  block = 0,
  beta,
  nmaxkrige = 12,
  idp = 2,
  nmaxidw = 12,
  hybrid.parameter = 2,
  lambda = 1,
  validation = "CV",
  cv.fold = 10,
  predacc = "VEcv",
  ...
)

Arguments

formula.glm

a formula defining the response variable and predictive variables for 'glm'.

longlat

a dataframe contains longitude and latitude of point samples.

trainxy

a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples.

y

a vector of the response variable in the formula, that is, the left part of the formula.

family

a description of the error distribution and link function to be used in the model. See '?glm' for details.

transformation

transform the residuals of 'glm' to normalise the data for 'krige'; can be "sqrt" for square root, "arcsine" for arcsine, "log" or "none" for non transformation. By default, "none" is used.

delta

numeric; to avoid log(0) in the log transformation. The default is 1.

formula.krige

formula defining the response vector and (possible) regressor. an object (i.e., 'variogram.formula') for 'variogram' or a formula for 'krige'. see 'variogram' and 'krige' in 'gstat' for details.

vgm.args

arguments for 'vgm', e.g. variogram model of response variable and anisotropy parameters. see 'vgm' in 'gstat' for details. By default, "Sph" is used.

anis

anisotropy parameters: see notes 'vgm' in 'gstat' for details.

alpha

direction in plane (x,y). see variogram in 'gstat' for details.

block

block size. see 'krige' in 'gstat' for details.

beta

for simple kriging. see 'krige' in 'gstat' for details.

nmaxkrige

for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used.

idp

a numeric number specifying the inverse distance weighting power.

nmaxidw

for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used.

hybrid.parameter

the default is 2 that is for 'glmkrigeglmidw'; for 'glmglmkrigeglmidw', it needs to be 3.

lambda

ranging from 0 to 2; the default is 1 for 'glmkrigeglmidw' and 'glmglmkrigeglmidw'; and if it is < 1, more weight is placed on 'krige', otherwise more weight is placed on 'idw'; and if it is 0, 'idw' is not considered and the resultant methods is 'glmkrige' when the default 'hybrid.parameter' is used; and if it is 2, then the resultant method is 'glmidw' when the default 'hybrid.parameter' is used.

validation

validation methods, include 'LOO': leave-one-out, and 'CV': cross-validation.

cv.fold

integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended.

predacc

can be either "VEcv" for vecv or "ALL" for all measures in function pred.acc.

...

other arguments passed on to 'glm', 'krige' and 'gstat'.

Value

A list with the following components: me, rme, mae, rmae, mse, rmse, rrmse, vecv and e1; or vecv only

Note

This function is largely based on 'rfcv' in 'randomForest', 'krigecv' in 'spm2'and 'glm' in 'stats'.

Author(s)

Jin Li

References

Li, J. (2022). Spatial Predictive Modeling with R. Boca Raton, Chapman and Hall/CRC.

Li, J., Alvarez, B., Siwabessy, J., Tran, M., Huang, Z., Przeslawski, R., Radke, L., Howard, F. and Nichol, S. (2017). "Application of random forest, generalised linear model and their hybrid methods with geostatistical techniques to count data: Predicting sponge species richness." Environmental Modelling & Software 97: 112-129.

A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18-22.

Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.

Examples


library(spm)
# glmokglidw
data(petrel)
gravel <- petrel[, c(1, 2, 6:9, 5)]
longlat <- petrel[, c(1, 2)]
model <- log(gravel + 1) ~  lat +  bathy + I(long^3) + I(lat^2) + I(lat^3)
y <- log(gravel[, 7] +1)
set.seed(1234)
glmkrigeglmidwcv1 <- glmkrigeidwcv(formula.glm = model, longlat = longlat,
trainxy =  gravel, y = y, transformation = "none", formula.krige = res1 ~ 1,
vgm.args = "Sph", nmaxkrige = 12, idp = 2, nmaxidw = 12, validation = "CV",
 predacc = "ALL")
glmkrigeglmidwcv1 # Since the default 'family' is used, actually a 'lm' model is used.

# glmokglmidw
data(spongelonglat)
longlat <- spongelonglat[, 7:8]
model <- sponge ~ long + I(long^2)
y = spongelonglat[, 1]
set.seed(1234)
glmkrigeglmidwcv1 <- glmkrigeidwcv(formula.glm = model, longlat = longlat,
trainxy = spongelonglat, y = y, family = poisson, transformation = "arcsine",
formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, idp = 2,
nmaxidw = 12, validation = "CV", predacc = "ALL")
glmkrigeglmidwcv1

# glmglmokglmidw
data(spongelonglat)
longlat <- spongelonglat[, 7:8]
model <- sponge ~ long + I(long^2)
y = spongelonglat[, 1]
set.seed(1234)
glmglmkrigeglmidwcv1 <- glmkrigeidwcv(formula.glm = model, longlat = longlat,
trainxy = spongelonglat, y = y, family = poisson, transformation = "arcsine",
formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, idp = 2,
nmaxidw = 12, hybrid.parameter = 3, validation = "CV", predacc = "ALL")
glmglmkrigeglmidwcv1

# glmokglidw for count data
data(spongelonglat)
longlat <- spongelonglat[, 7:8]
model <- sponge ~ . # use all predictive variables in the dataset
y = spongelonglat[, 1]
set.seed(1234)
n <- 20 # number of iterations,60 to 100 is recommended.
VEcv <- NULL
for (i in 1:n) {
 glmkrigeglmidwcv1 <- glmkrigeidwcv(formula.glm = model, longlat = longlat,
 trainxy = spongelonglat, y = y, family = poisson, formula.krige = res1 ~ 1,
 vgm.args = ("Sph"), nmaxkrige = 12, idp = 2, nmaxidw = 12, validation = "CV",
 predacc = "VEcv")
 VEcv [i] <- glmkrigeglmidwcv1
 }
 plot(VEcv ~ c(1:n), xlab = "Iteration for GLM", ylab = "VEcv (%)")
 points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2)
 abline(h = mean(VEcv), col = 'blue', lwd = 2)

# glmokglmidw for percentage data
longlat <- petrel[, c(1, 2)]
model <- gravel / 100 ~  lat +  bathy + I(long^3) + I(lat^2) + I(lat^3)
set.seed(1234)
n <- 20 # number of iterations,60 to 100 is recommended.
VEcv <- NULL
for (i in 1:n) {
glmkrigeglmidwcv1 <- glmkrigeidwcv(formula.glm = model, longlat = longlat,
trainxy = gravel, y = gravel[, 7] / 100, family = binomial(link=logit),
formula.krige = res1 ~ 1, vgm.args = ("Sph"), nmaxkrige = 12, idp = 2,
nmaxidw = 12, validation = "CV", predacc = "VEcv")
VEcv [i] <- glmkrigeglmidwcv1
}
plot(VEcv ~ c(1:n), xlab = "Iteration for GLM", ylab = "VEcv (%)")
points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2)
abline(h = mean(VEcv), col = 'blue', lwd = 2)



[Package spm2 version 1.1.3 Index]