aSECF {ZVCV} | R Documentation |
Approximate semi-exact control functionals (aSECF)
Description
This function performs approximate semi-exact control functionals as described in South et al (2020). It uses a nystrom approximation and conjugate gradient to speed up SECF.
This is faster than SECF
for large N
. If you would like to choose
between different kernels using cross-validation, then you can use aSECF_crossval
.
Usage
aSECF(
integrands,
samples,
derivatives,
polyorder = NULL,
steinOrder = NULL,
kernel_function = NULL,
sigma = NULL,
K0 = NULL,
nystrom_inds = NULL,
est_inds = NULL,
apriori = NULL,
conjugate_gradient = TRUE,
reltol = 0.01,
diagnostics = FALSE
)
Arguments
integrands |
An |
samples |
An |
derivatives |
An |
polyorder |
(optional) The order of the polynomial to be used in the parametric component, with a default of |
steinOrder |
(optional) This is the order of the Stein operator. The default is |
kernel_function |
(optional) Choose between "gaussian", "matern", "RQ", "product" or "prodsim". See below for further details. |
sigma |
(optional) The tuning parameters of the specified kernel. This involves a single length-scale parameter in "gaussian" and "RQ", a length-scale and a smoothness parameter in "matern" and two parameters in "product" and "prodsim". See below for further details. |
K0 |
(optional) The kernel matrix. One can specify either this or all of |
nystrom_inds |
(optional) The sample indices to be used in the Nystrom approximation. |
est_inds |
(optional) A vector of indices for the estimation-only samples. The default when |
apriori |
(optional) A vector containing the subset of parameter indices to use in the polynomial. Typically this argument would only be used if the dimension of the problem is very large or if prior information about parameter dependencies is known. The default is to use all parameters |
conjugate_gradient |
(optional) A flag for whether to perform conjugate gradient to further speed up the nystrom approximation (the default is true). |
reltol |
(optional) The relative tolerance for choosing when the stop conjugate gradient iterations (the default is 1e-02).
using |
diagnostics |
(optional) A flag for whether to return the necessary outputs for plotting or estimating using the fitted model. The default is |
Value
A list with the following elements:
-
expectation
: The estimate(s) of the (k
) expectations(s). -
cond_no
: (Only ifconjugate_gradient
=TRUE
) The condition number of the matrix being solved using conjugate gradient. -
iter
: (Only ifconjugate_gradient
=TRUE
) The number of conjugate gradient iterations -
f_true
: (Only ifest_inds
is notNULL
) The integrands for the evaluation set. This should be the same as integrands[setdiff(1:N,est_inds),]. -
f_hat
: (Only ifest_inds
is notNULL
) The fitted values for the integrands in the evaluation set. This can be used to help assess the performance of the Gaussian process model. -
a
: (Only ifdiagnostics
=TRUE
) The value ofa
as described in South et al (2020), where predictions are of the formf_hat = K0*a + Phi*b
for heldout K0 and Phi matrices and estimators using heldout samples are of the formmean(f - f_hat) + b[1]
. -
b
: (Only ifdiagnostics
=TRUE
) The value ofb
as described in South et al (2020), where predictions are of the formf_hat = K0*a + Phi*b
for heldout K0 and Phi matrices and estimators using heldout samples are of the formmean(f - f_hat) + b[1]
. -
ny_inds
: (Only ifdiagnostics
=TRUE
) The indices of the samples used in the nystrom approximation (this will match nystrom_inds if this argument was notNULL
).
On the choice of \sigma
, the kernel and the Stein order
The kernel in Stein-based kernel methods is L_x L_y k(x,y)
where L_x
is a first or second order Stein operator in x
and k(x,y)
is some generic kernel to be specified.
The Stein operators for distribution p(x)
are defined as:
-
steinOrder=1
:L_x g(x) = \nabla_x^T g(x) + \nabla_x \log p(x)^T g(x)
(see e.g. Oates el al (2017)) -
steinOrder=2
:L_x g(x) = \Delta_x g(x) + \nabla_x log p(x)^T \nabla_x g(x)
(see e.g. South el al (2020))
Here \nabla_x
is the first order derivative wrt x
and \Delta_x = \nabla_x^T \nabla_x
is the Laplacian operator.
The generic kernels which are implemented in this package are listed below. Note that the input parameter sigma
defines the kernel parameters \sigma
.
-
"gaussian"
: A Gaussian kernel,k(x,y) = exp(-z(x,y)/\sigma^2)
-
"matern"
: A Matern kernel with\sigma = (\lambda,\nu)
,k(x,y) = bc^{\nu}z(x,y)^{\nu/2}K_{\nu}(c z(x,y)^{0.5})
where
b=2^{1-\nu}(\Gamma(\nu))^{-1}
,c=(2\nu)^{0.5}\lambda^{-1}
andK_{\nu}(x)
is the modified Bessel function of the second kind. Note that\lambda
is the length-scale parameter and\nu
is the smoothness parameter (which defaults to 2.5 forsteinOrder=1
and 4.5 forsteinOrder=2
). -
"RQ"
: A rational quadratic kernel,k(x,y) = (1+\sigma^{-2}z(x,y))^{-1}
-
"product"
: The product kernel that appears in Oates et al (2017) with\sigma = (a,b)
k(x,y) = (1+a z(x) + a z(y))^{-1} exp(-0.5 b^{-2} z(x,y))
-
"prodsim"
: A slightly different product kernel with\sigma = (a,b)
(see e.g. https://www.imperial.ac.uk/inference-group/projects/monte-carlo-methods/control-functionals/),k(x,y) = (1+a z(x))^{-1}(1 + a z(y))^{-1} exp(-0.5 b^{-2} z(x,y))
In the above equations, z(x) = \sum_j x[j]^2
and z(x,y) = \sum_j (x[j] - y[j])^2
. For the last two kernels, the code only has implementations for steinOrder
=1
. Each combination of steinOrder
and kernel_function
above is currently hard-coded but it may be possible to extend this to other kernels in future versions using autodiff. The calculations for the first three kernels above are detailed in South et al (2020).
Author(s)
Leah F. South
References
South, L. F., Karvonen, T., Nemeth, C., Girolami, M. and Oates, C. J. (2020). Semi-Exact Control Functionals From Sard's Method. https://arxiv.org/abs/2002.00033
See Also
See ZVCV for examples and related functions. See aSECF_crossval
for a function to choose between different kernels for this estimator.