R: Sample point pair absolute differences for use in...

sk_sample_vg {snapKrig}

R Documentation

Sample point pair absolute differences for use in semi-variogram estimation

Description

Compute the absolute differences for point pairs in g, along with their separation distances. If no sample point index is supplied (in idx), the function samples points at random using sk_sample_pt.

Usage

sk_sample_vg(
  g,
  n_pp = 10000,
  idx = NULL,
  n_bin = 25,
  n_layer_max = NA,
  quiet = FALSE
)

Arguments

`g`	any grid object accepted or returned by `sk`
`n_pp`	integer maximum number of point pairs to sample
`idx`	optional integer vector indexing the points to sample
`n_bin`	integer number of distance bins to assign (passed to `sk_add_bins`)
`n_layer_max`	integer, maximum number of layers to sample (for multi-layer `g`)
`quiet`	logical, suppresses console output

Details

In a set of n points there are n_pp(n) = (n^2-n)/2 possible point pairs. This expression is inverted to determine the maximum number of sample points in g to use in order to satisfy the argument n_pp, the maximum number of point pairs to sample. A random sub-sample of idx is taken as needed. By default n_pp=1e4 which results in n=141.

The mean of the point pair absolute values ('dabs') for a given distance interval is the classical estimator of the variogram. This and two other robust methods are implemented in sk_plot_semi. These statistics are sensitive to the choice of distance bins. They are added automatically by a call to sk_add_bins (with n_bin) but users can also set up bins manually by adjusting the 'bin' column of the output.

For multi-layer g, the function samples observed point locations once and re-uses this selection in all layers. At most n_layer_max layers are sampled in this way (default is the square root of the number of layers, rounded up)

Value

A data frame with a row for each sampled point pair. Fields include 'dabs' and 'd', the absolute difference in point values and the separation distance, along with the vector index, row and column numbers, and component (x, y) distances for each point pair. 'bin' indicates membership in one of n_bin categories.

Examples


# make example grid and reference covariance model
gdim = c(22, 15)
n = prod(gdim)
g_empty = sk(gdim)
pars = sk_pars(g_empty, 'mat')

# generate sample data and sample semi-variogram
g_obs = sk_sim(g_empty, pars)
vg = sk_sample_vg(g_obs)
str(vg)

# pass to plotter and overlay the model that generated the data
sk_plot_semi(vg, pars)

# repeat with smaller sample sizes
sk_plot_semi(sk_sample_vg(g_obs, 1e2), pars)
sk_plot_semi(sk_sample_vg(g_obs, 1e3), pars)

# use a set of specific points
n_sp = 10
( n_sp^2 - n_sp ) / 2 # the number of point pairs
vg = sk_sample_vg(g_obs, idx=sample.int(n, n_sp))
sk_plot_semi(vg, pars)

# non-essential examples skipped to stay below 5s exec time on slower machines


# repeat with all point pairs sampled (not recommended for big data sets)
vg = sk_sample_vg(g_obs, n_pp=Inf)
sk_plot_semi(vg, pars)
( n^2 - n ) / 2 # the number of point pairs

## example with multiple layers

# generate five layers
g_obs_multi = sk_sim(g_empty, pars, n_layer=5)

# by default, a sub-sample of sqrt(n_layers) is selected
vg = sk_sample_vg(g_obs_multi)
sk_plot_semi(vg, pars)

# change this behaviour with n_layer_max
vg = sk_sample_vg(g_obs_multi, n_layer_max=5)
sk_plot_semi(vg, pars)

[Package snapKrig version 0.0.2 Index]