accuracy {gmGeostats} | R Documentation |
Compute accuracy and precision
Description
Computes goodness-of-fit measures (accuracy, precision and joint goodness) adapted or extended from the definition of Deutsch (1997).
Usage
accuracy(object, ...)
## S3 method for class 'data.frame'
accuracy(
object,
observed = object$observed,
prob = seq(from = 0, to = 1, by = 0.05),
method = "kriging",
outMahalanobis = FALSE,
ivar,
...
)
## S3 method for class 'DataFrameStack'
accuracy(
object,
observed,
ivars = intersect(colnames(observed), dimnames(object)[[noStackDim(object)]]),
prob = seq(from = 0, to = 1, by = 0.05),
method = ifelse(length(ivars) == 1, "simulation", "Mahalanobis"),
outMahalanobis = FALSE,
...
)
Arguments
object |
data container for the predictions (plus cokriging error variances/covariance) or simulations (and eventually for the true values in univariate problems) |
... |
generic functionality, currently ignored |
observed |
either a vector- or matrix-like object of the true values |
prob |
sequence of cutoff probabilities to use for the calculations |
method |
which method was used for generating the predictions/simulations?
one of c("kriging", "cokriging", "simulation") for |
outMahalanobis |
if TRUE, do not do the final accuracy calculations and return the Mahalanobis norms of the residuals; if FALSE do the accuracy calculations |
ivar |
if |
ivars |
in multivariate cases, a vector of names of the variables to analyse (or one single variable name) |
Details
For method "kriging", object
must contain columns with names including the string "pred" for predictions
and "var" for the kriging variance; the observed values can also be included as an extra column with name "observed",
or else additionally provided in argument observed
. For method "cokriging", the columns of object
must contain
predictions, cokriging variances and cokriging covariances in columns including the strings "pred", "var" resp. "cov",
and observed values can only be provided via observed
argument. Note that these are the natural formats when
using gstat::predict.gstat()
and other (co)kriging functions of that package.
For univariate and multivariate cokriging results (methods "kriging" and "cokriging"), the coverage values are computed based on the
Mahalanobis square error, the (square) distance between prediction and true value, using as the positive definite bilinear form
of the distance the variance-covariance cokriging matrix. The rationale is that, under the assumption
that the random field is Gaussian, the distribution of this Mahalanobis square error should
follow a with degrees of freedom
equal to the number of variables. Having this
reference distribution allows us to compute confidence intervals for that Mahalanobis square error, and then
count how many of the actually observed errors are included on each one of the intervals (the coverage).
For a perfect adjustment to the distribution, the plot of coverage vs. nominal confidence (see plot.accuracy)
should fall on the
line. NOTE: the original definition of Deutsch (1997) for univariate case
did not make use of the
distribution, but instead derived the desired intervals (symmetric!)
from the standard normal distribution appearing by normalizing the residual with the kriging variance; the result is the
same.
For method "simulation" and object object
is a data.frame, the variable names containing the realisations must
contain the string "sim", and observed
must be a vector with as many elements as rows has object
. If
object
is a DataFrameStack()
, then it is assumed that the stacking dimension is running through the realisations;
the true values must still be given in observed
.
In both cases, the method is based on ranks:
with them we can calculate which is the frequency of simulations being more extreme than the observed value.
This calculation is done considering bilateral intervals around the median of (realisations, observed value)
for each location separately.
Method "mahalanobis" ("Mahalanobis" also works) is the analogous for multivariate simulations. It
only works for object
of class DataFrameStack()
, and requires the stacking dimension to run through
the realisations and the other two dimensions to coincide with the dimensions of observed
, i.e.
giving locations by rows and variables by columns. In this case, a covariance matrix will be computed
and this will be used to compute the Mahalanobis square error defined above in method "cokriging":
this Mahalanobis square error will be computed for each simulation and for the true value.
The simulated Mahalanobis square errors will then be used to generate the reference distribution
with which to derive confidence intervals.
Finally, highly experimental "flow" method requires the input to be in the same shape as method
"mahalanobis". The method is mostly the same, just that before the Mahalanobis square errors
are computed a location-wise flow anamorphosis (ana()
) is applied to transform the realisations (including
the true value as one of them) to joint normality. The rest of the calculations are done as if with
method "mahalanobis".
Value
If outMahalanobis=TRUE
(the primary use), this function returns a two-column dataset of class
c("accuracy", "data.frame"), which first column gives the nominal probability cutoffs used, and the second column
the actual coverage of the intervals of each of these probabilities. If outMahalanobis=FALSE
, the output
is a vector (for prediction) or matrix (for simulation) of Mahalanobis error norms.
Methods (by class)
-
data.frame
: Compute accuracy and precision -
DataFrameStack
: Compute accuracy and precision
References
Mueller, Selia and Tolosana-Delgado (2023) Multivariate cross-validation and measures of accuracy and precision. Mathematical Geosciences (under review).
See Also
Other accuracy functions:
mean.accuracy()
,
plot.accuracy()
,
precision()
,
validate()
,
xvErrorMeasures.default()
,
xvErrorMeasures()