R: Cross validation, n-fold and leave-one-out for the hybrid...

glsidwcv {spm2}

R Documentation

Cross validation, n-fold and leave-one-out for the hybrid method of generalized least squares ('gls') and inverse distance weighted ('idw') (glsidw)

Description

This function is a cross validation function for the hybrid method of 'gls' and 'idw', where the data splitting is based on a stratified random sampling method (see the 'datasplit' function for details)

Usage

glsidwcv(
  model = var1 ~ 1,
  longlat,
  trainxy,
  y,
  corr.args = NULL,
  weights = NULL,
  idp = 2,
  nmaxidw = 12,
  validation = "CV",
  cv.fold = 10,
  predacc = "VEcv",
  ...
)

Arguments

`model`	a formula defining the response variable and predictive variables.
`longlat`	a dataframe contains longitude and latitude of point samples.
`trainxy`	a dataframe contains longitude (long), latitude (lat), predictive variables and the response variable of point samples. That is, the location information must be names as 'long' and 'lat'.
`y`	a vector of the response variable in the formula, that is, the left part of the formula.
`corr.args`	arguments for 'correlation' in 'gls'. See '?corClasses' in 'nlme' for details. By default, "NULL" is used. When "NULL" is used, then 'gls' is actually performing 'lm'.
`weights`	describing the within-group heteroscedasticity structure. Defaults to "NULL", corresponding to homoscedastic errors. See '?gls' in 'nlme' for details.
`idp`	a numeric number specifying the inverse distance weighting power.
`nmaxidw`	for a local predicting: the number of nearest observations that should be used for a prediction or simulation, where nearest is defined in terms of the space of the spatial locations. By default, 12 observations are used.
`validation`	validation methods, include 'LOO': leave-one-out, and 'CV': cross-validation.
`cv.fold`	integer; number of folds in the cross-validation. if > 1, then apply n-fold cross validation; the default is 10, i.e., 10-fold cross validation that is recommended.
`predacc`	can be either "VEcv" for vecv or "ALL" for all measures in function pred.acc.
`...`	other arguments passed on to 'gls' and 'gstat'.

Value

A list with the following components: me, rme, mae, rmae, mse, rmse, rrmse, vecv and e1; or vecv only.

Note

This function is largely based on rfcv in 'randomForest' and 'gls' in 'library(nlme)'.

Author(s)

Jin Li

References

Pinheiro, J. C. and D. M. Bates (2000). Mixed-Effects Models in S and S-PLUS. New York, Springer.

Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30: 683-691.

Examples


library(spm)
library(nlme)

data(petrel)
gravel <- petrel[, c(1, 2, 6:9, 5)]
longlat <- petrel[, c(1, 2)]
range1 <- 0.8
nugget1 <- 0.5
model <- log(gravel + 1) ~  long + lat +  bathy + dist + I(long^2) + I(lat^2) +
I(lat^3) + I(bathy^2) + I(bathy^3) + I(dist^2) + I(dist^3) + I(relief^2) + I(relief^3)

glsidwcv1 <- glsidwcv(model = model, longlat = longlat, trainxy = gravel,
y = log(gravel[, 7] +1), idp = 2, nmaxidw = 12, validation = "CV",
 corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE),
 predacc = "ALL")
glsidwcv1

# For glsidw
set.seed(1234)
n <- 20 # number of iterations,60 to 100 is recommended.
VEcv <- NULL
for (i in 1:n) {
glsidwcv1 <- glsidwcv(model = model, longlat = longlat, trainxy = gravel,
y = log(gravel[, 7] +1), idp = 2, nmaxidw = 12, validation = "CV",
corr.args = corSpher(c(range1, nugget1), form = ~ lat + long, nugget = TRUE),
predacc = "VEcv")
VEcv [i] <- glsidwcv1
}
plot(VEcv ~ c(1:n), xlab = "Iteration for GLSIDW", ylab = "VEcv (%)")
points(cumsum(VEcv) / c(1:n) ~ c(1:n), col = 2)
abline(h = mean(VEcv), col = 'blue', lwd = 2)

[Package spm2 version 1.1.3 Index]