R: Compute kriging predictor (or variance) for an sk grid

sk_cmean {snapKrig}

R Documentation

Compute kriging predictor (or variance) for an sk grid

Description

Evaluates the kriging prediction equations to find the expected value (mean) of the spatial process for g at all grid points, including unobserved ones.

Usage

sk_cmean(
  g,
  pars,
  X = NA,
  what = "p",
  out = "s",
  fac_method = "chol",
  fac = NULL,
  quiet = FALSE
)

Arguments

`g`	an sk grid or list accepted by `sk` (with entries 'gdim', 'gres', 'gval')
`pars`	list of form returned by `sk_pars` (with entries 'y', 'x', 'eps', 'psill')
`X`	sk grid, numeric, vector, matrix, or NA: the mean, or its linear predictors
`what`	character, what to compute: one of 'p' (predictor), 'v' (variance), or 'm' (more)
`out`	character, the return object, either 's' (sk grid) or 'v' (vector)
`fac_method`	character, either 'chol' or 'eigen'
`fac`	(optional) pre-computed factorization of covariance matrix scaled by partial sill
`quiet`	logical indicating to suppress console output

Details

This predicts a noiseless version of the random process from which grid g was sampled, conditional on the observed data, and possibly a set of covariates. It is optimal in the sense of minimizing mean squared prediction error under the covariance model specified by pars, and assuming the predictions are a linear combination of the observed data.

The estimation method is determined by X. Set this to 0 and supply a de-trended g to do simple kriging. Set it to NA to estimate a spatially uniform mean (ordinary kriging). Or pass covariates to X, either as multi-layer sk grid or matrix, to do universal kriging. See sk_GLS for details on specifying X in this case.

Set what='v' to return the point-wise kriging variance. This usually takes much longer to evaluate than the prediction, but the computer memory demands are similar. A progress bar will be printed to console in this case unless quiet=TRUE.

Technical notes

All calculations returned by sk_cmean are exact. Our implementation is based on the variance decomposition suggested in section 3.4 (p. 153-155) of Cressie (1993), and uses a loop over eigen-vectors (for observed locations) to compute variance iteratively.

In technical terms, sk_cmean estimates the mean of the signal process behind the data. The nugget effect (eps) is therefore added to the diagonals of the covariance matrix for the observed points, but NOT to the corresponding entries of the cross-covariance matrix. This has the effect of smoothing (de-noising) predictions at observed locations, which means sk_cmean is not an exact interpolator (except in the limit eps -> 0). Rather it makes a correction to the observed data to make it consistent with the surrounding signal.

This is a good thing - real spatial datasets are almost always noisy, and we are typically interested in the signal, not some distorted version of it. For more on this see section 3 of Cressie (1993), and in particular the discussion in 3.2.1 on the nugget effect.

The covariance factorization fac can be pre-computed using sk_var with arguments scaled=TRUE (and, if computing variance, fac_method='eigen'). This will speed up subsequent calls where only the observed data values have changed (same covariance structure pars, and same NA structure in the data). The kriging variance does not change in this case and only needs to be computed once.

reference: "Statistics for Spatial Data" by Noel Cressie (1993)

Value

numeric matrix, the predicted values (or their variance)

Examples


## set up very simple example problem

# make example grid and covariance parameters
g_all = sk_sim(10)
pars = sk_pars(g_all)
plot(g_all)

# remove most of the data
n = length(g_all)
p_miss = 0.90
is_miss = seq(n) %in% sample.int(n, round(p_miss*n))
is_obs = !is_miss
g_miss = g_all
g_miss[is_miss] = NA
plot(g_miss)


## simple kriging

# estimate the missing data conditional on what's left over
g_simple = sk_cmean(g_miss, pars, X=0)
plot(g_simple)

# variance of the estimator
g_simple_v = sk_cmean(g_miss, pars, X=0, what='v', quiet=TRUE)
plot(g_simple_v)

# get the same results with pre-computed variance
var_pc = sk_var(g_miss, pars, scaled=TRUE, fac_method='eigen')
g_simple_v_compare = sk_cmean(g_miss, pars, X=0, what='v', fac=var_pc, quiet=TRUE)
max(abs(g_simple_v_compare - g_simple_v))

## ordinary kriging

# estimate spatially uniform mean - true value is 0
sk_GLS(g_miss, pars, out='b')

# ordinary kriging automatically adjusts for the trend
g_ok = sk_cmean(g_miss, pars, X=NA)

# additional uncertainty in estimation means variance goes up a bit
g_ok_v = sk_cmean(g_miss, pars, X=NA, what='v', quiet=TRUE)
range(g_ok_v - g_simple_v)


## universal kriging

# introduce some covariates
n_betas = 3
betas = stats::rnorm(n_betas, 0, 10)
g_X = sk_sim(g_all, pars, n_layer=n_betas-1L)
g_lm_all = g_all + as.vector(cbind(1, g_X[]) %*% betas)
g_lm_miss = g_lm_all
g_lm_miss[is_miss] = NA

# prediction
g_uk = sk_cmean(g_lm_miss, pars, g_X)
g_uk_v = sk_cmean(g_lm_miss, pars, g_X, what='v', quiet=TRUE)


## repeat with special subgrid case (faster!)

# make g_all a subgrid of a larger example
g_super = sk_rescale(g_all, down=2)

# re-generate the covariates for the larger extent
g_X_super = sk_sim(g_super, pars, n_layer=n_betas-1L)
g_lm_super = g_super + as.vector(cbind(1, g_X_super[]) %*% betas)

# prediction
g_super_uk = sk_cmean(g_lm_super, pars, g_X_super)
g_super_uk_v = sk_cmean(g_lm_super, pars, g_X_super, what='v', quiet=TRUE)


## verification

# get observed variance and its inverse
V_full = sk_var(g_all, pars)
is_obs = !is.na(g_miss)
Vo = V_full[is_obs, is_obs]
Vo_inv = solve(Vo)

# get cross covariance
is_diag = as.logical(diag(nrow=length(g_all))[is_obs,])
Vc = V_full[is_obs,]

# nugget adjustment to diagonals (corrects the output at observed locations)
Vc[is_diag] = Vc[is_diag] - pars[['eps']]

# get covariates matrix and append intercept column
X = g_X[]
X_all = cbind(1, X)
X_obs = X_all[is_obs,]

# simple kriging the hard way
z = g_miss[is_obs]
z_trans = Vo_inv %*% z
z_pred_simple = c(t(Vc) %*% z_trans)
z_var_simple = pars[['psill']] - diag( t(Vc) %*% Vo_inv %*% Vc )

# ordinary kriging the hard way
x_trans = ( 1 - colSums( Vo_inv %*% Vc ) )
m_ok = x_trans / sum(Vo_inv)
z_pred_ok = c( (t(Vc) + m_ok) %*% z_trans )
z_var_ok = z_var_simple + diag( x_trans %*% t(m_ok) )

# universal kriging the hard way
z_lm = g_lm_miss[is_obs]
z_lm_trans = Vo_inv %*% z_lm
x_lm_trans = X_all - t( t(X_obs) %*% Vo_inv %*% Vc )
m_uk = x_lm_trans %*% solve(t(X_obs) %*% Vo_inv %*% X_obs)
z_pred_uk = c( (t(Vc) + t(X_obs %*% t(m_uk)) ) %*% z_lm_trans )
z_var_uk = z_var_simple + diag( x_lm_trans %*% t(m_uk) )

# check that these results agree with sk_cmean
max(abs(z_pred_simple - g_simple), na.rm=TRUE)
max(abs(z_var_simple - g_simple_v))
max(abs(z_pred_ok - g_ok), na.rm=TRUE)
max(abs(z_var_ok - g_ok_v))
max(abs(z_pred_uk - g_uk), na.rm=TRUE)
max(abs(z_var_uk - g_uk_v))


## repeat verification for sub-grid case

# rebuild matrices
V_full = sk_var(g_super_uk, pars)
is_obs = !is.na(g_super)
Vo = V_full[is_obs, is_obs]
Vo_inv = solve(Vo)
is_diag = as.logical(diag(nrow=length(g_super))[is_obs,])
Vc = V_full[is_obs,]
Vc[is_diag] = Vc[is_diag] - pars[['eps']]
X = g_X_super[]
X_all = cbind(1, X)
X_obs = X_all[is_obs,]
z = g_miss[is_obs]

# universal kriging the hard way
z_var_simple = pars[['psill']] - diag( t(Vc) %*% Vo_inv %*% Vc )
z_lm = g_lm_super[is_obs]
z_lm_trans = Vo_inv %*% z_lm
x_lm_trans = X_all - t( t(X_obs) %*% Vo_inv %*% Vc )
m_uk = x_lm_trans %*% solve(t(X_obs) %*% Vo_inv %*% X_obs)
z_pred_uk = c( (t(Vc) + t(X_obs %*% t(m_uk)) ) %*% z_lm_trans )
z_var_uk = z_var_simple + diag( x_lm_trans %*% t(m_uk) )

# verification
max(abs(z_pred_uk - g_super_uk), na.rm=TRUE)
max(abs(z_var_uk - g_super_uk_v))

[Package snapKrig version 0.0.2 Index]