central_region {GET} | R Documentation |
Central region / Global envelope
Description
Provides central regions or global envelopes or confidence bands
Usage
central_region(
curve_sets,
type = "erl",
coverage = 0.5,
alternative = c("two.sided", "less", "greater"),
probs = c(0.25, 0.75),
quantile.type = 7,
central = "median",
nstep = 2,
...
)
Arguments
curve_sets |
A |
type |
The type of the global envelope with current options for 'rank', 'erl', 'cont', 'area', 'qdir', 'st' and 'unscaled'. See details. |
coverage |
A number between 0 and 1. The 100*coverage% central region will be calculated. A vector of values can also be provided, leading to the corresponding number of central regions. |
alternative |
A character string specifying the alternative hypothesis.
Must be one of the following: "two.sided" (default), "less" or "greater".
The last two options only available for types |
probs |
A two-element vector containing the lower and upper quantiles for the measure 'q' or 'qdir', in that order and on the interval [0, 1]. The default values are 0.025 and 0.975, suggested by Myllymäki et al. (2015, 2017). |
quantile.type |
As type argument of |
central |
Either "mean" or "median". If the curve sets do not contain the component
|
nstep |
1 or 2 for how to contruct a combined global envelope if list of curve sets is provided. 2 (default) for a two-step combining procedure, 1 for one-step. |
... |
Ignored. |
Details
Given a curve_set
object,
or an envelope
object of spatstat or fdata
object of fda.usc,
the function central_region
constructs a central region, i.e. a global envelope,
from the given set of functions (or vectors).
Generally an envelope is a band bounded by the vectors (or functions)
T_{low}
and T_{hi}
.
A 100(1-\alpha)
% or 100*coverage% global envelope is a set
(T_{low}, T_{hi})
of envelope vectors
such that the probability that T_i
falls outside this envelope
in any of the d points of the vector T_i
is less or equal to \alpha
.
The global envelopes can be constructed based on different measures
that order the functions from the most extreme one to the least extreme one.
We use such orderings of the functions for which we are able to construct global envelopes
with intrinsic graphical interpretation.
The type of the global envelope can be chosen with the argument type
and
the options are given in the following.
Further information about the measures, on which the global envelopes are based,
can be found in Myllymäki and Mrkvička (2020, Section 2.).
-
'rank'
: The global rank envelope proposed by Myllymäki et al. (2017) based on the extreme rank defined as the minimum of pointwise ranks. -
'erl'
: The global rank envelope based on the extreme rank length (Myllymäki et al.,2017, Mrkvička et al., 2018). This envelope is constructed as the convex hull of the functions which have extreme rank length measure that is larger or equal to the critical\alpha
level of the extreme rank length measure. -
'cont'
: The global rank envelope based on the continuous rank (Hahn, 2015; Mrkvička et al., 2019) based on minimum of continuous pointwise ranks. It is contructed as the convex hull in a similar way as the'erl'
envelope. -
'area'
: The global rank envelope based on the area rank (Mrkvička et al., 2019) which is based on area between continuous pointwise ranks and minimum pointwise ranks for those argument (r) values for which pointwise ranks achieve the minimum (it is a combination of erl and cont). It is contructed as the convex hull in a similar way as the'erl'
and'area'
envelopes. -
'qdir'
: The directional quantile envelope based on the directional quantile maximum absolute deviation (MAD) test (Myllymäki et al., 2017, 2015), which takes into account the unequal variances of the test function T(r) for different distances r and is also protected against asymmetry of distribution of T(r). -
'st'
: The studentised envelope based on the studentised MAD measure (Myllymäki et al., 2017, 2015), which takes into account the unequal variances of the test function T(r) for different distances r. -
'unscaled'
: The unscaled envelope (Ripley, 1981), which leads to envelopes with constant width. It corresponds to the classical maximum deviation test without scaling. This test suffers from unequal variance of T(r) over the distances r and from the asymmetry of distribution of T(r). We recommend to use the other alternatives instead. This unscaled global envelope is provided for reference.
The values of the chosen measure M are determined for each curve in the curve_set
, and
based on the chosen measure, the central region, i.e. the global envelope, is constructed
for the given curves.
If a list of (suitable) objects are provided in the argument curve_sets
,
then by default (nstep = 2
) the two-step combining procedure is used to
construct the combined global envelope as described in Myllymäki and Mrkvička (2020, Section 2.2.).
If nstep = 1
and the lengths of the multivariate vectors in each component
of the list are equal, then the one-step combining procedure is used where the
functions are concatenated together into a one long vector (see again Myllymäki and Mrkvička, 2020, Section 2.2.).
Value
Either an object of class global_envelope
and or an combined_global_envelope
object.
The former class is obtained when a set of curves is provided, while the latter in the case
that curve_sets
is a list of objects. The print and plot function are defined for the
returned objects (see examples).
The global_envelope
object is essentially a data frame containing columns
r = the vector of values of the argument r at which the test was made
lo = the lower envelope based on the simulated functions; in case of a vector of coverage values, several 'lo' exist with names paste0("lo.", 100*coverage)
hi = the upper envelope based on the simulated functions; in case of a vector of coverage values, several 'lo' exist with names paste0("hi.", 100*coverage)
central = If the
curve_set
(orenvelope
object) contains a theoretical curve, then this function is used as the central curve and returned in this component. Otherwise, the central curve is the mean or median (according to the argumentcentral
) of the test functions T_i(r), i=2, ..., s+1. Used for visualization only.
and potentially additionally
obs = the data function, if there is only one data function in the given
curve_sets
. Otherwise not existing.
(Most often central_region
is directly applied to functional data where all curves are observed.)
Additionally, the returned object has some attributes, where
M = A vector of the values of the chosen measure for all the function. If there is only one observed function, then M[1] gives the value of the measure for this.
M_alpha = The critical value of M corresponding to the 100(1-alpha)% global envelope (see Myllymäki and Mrkvička, 2020, Definition 1.1. IGI).
Further the object has some attributes for printing and plotting purposes, where
alternative
, type
, ties
, alpha
correspond to those in the function call
and method
gives a name for the method.
Attributes of an object res
can be obtained using the function
attr
, e.g. attr(res, "M")
for the values of the ordering measure.
If the given set of curves had the class envelope
of spatstat, then the returned
global_envelope
object has also the class fv
of spatstat, whereby one can utilize
also the plotting functions of spatstat, see example in plot.global_envelope
.
However, the envelope
objects are most often used with global_envelope_test
and not with central_region
.
For an fv
object, also some further attributes exists as required by fv
of spatstat.
The combined_global_envelope
is a list of global_envelope
objects, where
the components correspond to the components of curve_sets
.
The combined_global_envelope
object constructed with nstep = 2
contains,
in addition to some conventional ones (method
, alternative
, type
, alpha
,
M
, M_alpha
, see above), the second level envelope information as the attributes
level2_ge = The second level envelope on which the envelope construction is based
level2_curve_set = The second level
curve_set
from whichlevel2_ge
is constructed
In the case that the given curve sets are two-dimensional, i.e., their arguments values are two-dimensional,
then the returned objects have in addition to the class global_envelope
or combined_global_envelope
,
the class global_envelope2d
or combined_global_envelope2d
, respectively. This class is assigned
for plotting purposes: For the 2d envelopes, also the default plots are 2d.
Otherwise the 1d and 2d objects are similar.
References
Mrkvička, T., Myllymäki, M., Jilek, M. and Hahn, U. (2020) A one-way ANOVA test for functional data with graphical interpretation. Kybernetika 56(3), 432-458. doi: 10.14736/kyb-2020-3-0432
Mrkvička, T., Myllymäki, M., Kuronen, M. and Narisetty, N. N. (2022) New methods for multiple testing in permutation inference for the general linear model. Statistics in Medicine 41(2), 276-297. doi: 10.1002/sim.9236
Myllymäki, M., Grabarnik, P., Seijo, H. and Stoyan. D. (2015). Deviation test construction and power comparison for marked spatial point patterns. Spatial Statistics 11, 19-34. doi: 10.1016/j.spasta.2014.11.004
Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017). Global envelope tests for spatial point patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79, 381-404. doi: 10.1111/rssb.12172
Myllymäki, M. and Mrkvička, T. (2020). GET: Global envelopes in R. arXiv:1911.06583 [stat.ME]. https://doi.org/10.48550/arXiv.1911.06583
Ripley, B.D. (1981). Spatial statistics. Wiley, New Jersey.
See Also
Examples
## A central region of a set of functions
#----------------------------------------
if(requireNamespace("fda", quietly=TRUE)) {
cset <- curve_set(r=as.numeric(row.names(fda::growth$hgtf)),
obs=fda::growth$hgtf)
plot(cset) + ggplot2::ylab("height")
cr <- central_region(cset, coverage=0.50, type="erl")
plot(cr)
}
## Confidence bands for linear or polynomial regression
#------------------------------------------------------
# Simulate regression data according to the cubic model
# f(x) = 0.8x - 1.8x^2 + 1.05x^3 for x in [0,1]
par <- c(0,0.8,-1.8,1.05) # Parameters of the true polynomial model
res <- 100 # Resolution
x <- seq(0, 1, by=1/res); x2=x^2; x3=x^3;
f <- par[1] + par[2]*x + par[3]*x^2 + par[4]*x^3 # The true function
d <- f + rnorm(length(x), 0, 0.04) # Data
# Plot the true function and data
plot(f, type="l", ylim=range(d))
points(d)
# Estimate polynomial regression model
reg <- lm(d ~ x + x2 + x3)
ftheta <- reg$fitted.values
resid0 <- reg$residuals
s0 <- sd(resid0)
# Bootstrap regression
B <- 2000 # Number of bootstrap samples
ftheta1 <- array(0, c(B,length(x)))
s1 <- array(0,B)
for(i in 1:B) {
u <- sample(resid0, size=length(resid0), replace=TRUE)
reg1 <- lm((ftheta+u) ~ x + x2 + x3)
ftheta1[i,] <- reg1$fitted.values
s1[i] <- sd(reg1$residuals)
}
# Centering and scaling
meanftheta <- apply(ftheta1, 2, mean)
m <- array(0, c(B,length(x)))
for(i in 1:B) { m[i,] <- (ftheta1[i,]-meanftheta)/s1[i] }
# Central region computation
boot.cset <- curve_set(r=1:length(x), obs=ftheta+s0*t(m))
cr <- central_region(boot.cset, coverage=c(0.50, 0.80, 0.95), type="erl")
# Plotting the result
plot(cr) + ggplot2::labs(x=expression(italic(x)), y=expression(italic(f(x)))) +
ggplot2::geom_point(data=data.frame(id=1:length(d), points=d),
ggplot2::aes(x=id, y=points)) + # data points
ggplot2::geom_line(data=data.frame(id=1:length(d), points=f),
ggplot2::aes(x=id, y=points)) # true function