GET-package {GET} | R Documentation |
Global Envelopes
Description
The GET package provides implementation of global envelopes for a set of general d-dimensional vectors T in various applications. A 100(1-alpha) the probability that T falls outside this envelope in any of the d points is equal to alpha. Global means that the probability is controlled simultaneously for all the d elements of the vectors. The global envelopes can be used for central regions of functional or multivariate data (e.g. outlier detection, functional boxplot), for graphical Monte Carlo and permutation tests where the test statistic is a multivariate vector or function (e.g. goodness-of-fit testing for point patterns and random sets, functional ANOVA, functional GLM, n-sample test of correspondence of distribution functions), and for global confidence and prediction bands (e.g. confidence band in polynomial regression, Bayesian posterior prediction).
Details
The GET package provides central regions (i.e. global envelopes) and global envelope tests with intrinsic graphical interpretation. The central regions can be constructed from (functional) data. The tests are Monte Carlo or permutation tests, which demand simulations from the tested null model. The methods are applicable for any multivariate vector data and functional data (after discretization).
To get an overview of the package, start R and type library("GET")
and vignette("GET")
.
To get examples of point pattern analysis, start R and type library("GET")
and vignette("pointpatterns")
.
To get examples of Mrkvička and Myllymäki (2022), start R and type library("GET")
and vignette("FDRenvelopes")
.
Key functions in GET
-
Central regions or global envelopes or confidence bands:
central_region
. E.g. 50% central region of growth curves of girlsgrowth
.First create a curve_set of the growth curves, e.g.
cset <- curve_set(r = as.numeric(row.names(growth$hgtf)), obs = growth$hgtf)
Then calculate 50% central region (see
central_region
for further arguments)cr <- central_region(cset, coverage = 0.5)
Plot the result (see
plot.global_envelope
for plotting options)plot(cr)
It is also possible to do combined central regions for several sets of curves provided in a list for the function, see examples in
central_region
. -
Global envelope tests:
global_envelope_test
is the main function. E.g. A test of complete spatial randomness (CSR) for a point patternX
:X <- spruces # an example pattern from spatstat
Use the function
envelope
of spatstat to create nsim simulations under CSR and to calculate the functions you want (below K-functions by Kest). Important: use the option 'savefuns=TRUE' and specify the number of simulationsnsim
.env <- envelope(X, nsim=999, savefuns = TRUE, fun = Kest, simulate = expression(runifpoint(ex = X)))
Perform the test (see
global_envelope_test
for further arguments)res <- global_envelope_test(env)
Plot the result (see
plot.global_envelope
for plotting options)plot(res)
It is also possible to do combined global envelope tests for several sets of curves provided in a list for the function, see examples in
global_envelope_test
. To obtain false discovery rate envelopes of Mrkvička and Myllymäki (2023) use the argumenttypeone = "fdr"
.
-
Functional ordering:
central_region
andglobal_envelope_test
are based on different measures for ordering the functions (or vectors) from the most extreme to the least extreme ones. The core functionality of calculating the measures is in the functionforder
, which can be used to obtain different measures for sets of curves. Usually there is no need to callforder
directly. -
Functional boxplots:
fBoxplot
-
Adjusted global envelope tests for composite null hypotheses
-
GET.composite
, see a detailed example insaplings
-
-
One-way functional ANOVA:
-
Graphical functional ANOVA tests:
graph.fanova
Global rank envelope based on F-values:
frank.fanova
-
-
Functional general linear model (GLM):
-
Graphical functional GLM:
graph.flm
Global rank envelope based on F-values:
frank.flm
For large data (not fitting comfortably in memory):
partial_forder
-
-
Functional clustering:
fclustering
-
Global quantile regression:
global_rq
Functions for performing global envelopes for other specific purposes:
Graphical n sample test of correspondence of distribution functions:
GET.distrequal
Permutation-based tests of independence to samples from any bivariate distribution:
GET.distrindep
Testing global and local dependence of point patterns on covariates:
GET.spatialF
Testing local correlations:
GET.localcor
Variogram and residual variogram with global envelopes:
GET.variogram
Deviation tests (for simple hypothesis):
deviation_test
(no graphical interpretation)Most functions accept the curves provided in a
curve_set
object. Usecurve_set
to create acurve_set
object from the functions. Other formats to provide the curves to the above functions are also accepted, see the information on the help pages.
See the help files of the functions for examples.
Workflow for (single hypothesis) tests based on single functions
To perform a test you always first need to obtain the test function T(r)
for your data (T_1(r)
) and for each simulation
(T_2(r), \dots, T_{s+1}(r)
) in one way or another.
Given the set of the functions T_i(r), i=1, \dots, s+1
,
you can perform a test by global_envelope_test
.
1) The workflow when using your own programs for simulations:
(Fit the model and) Create
s
simulations from the (fitted) null model.Calculate the functions
T_1(r), T_2(r), \dots, T_{s+1}(r)
.Use
curve_set
to create acurve_set
object from the functionsT_i(r), i=1, \dots, s+1
.Perform the test
res <- global_envelope_test(curve_set)
where
curve_set
is the 'curve_set'-object you created, and plot the resultplot(res)
2) The workflow utilizing spatstat: start R, type library("GET")
and vignette("pointpatterns")
,
which explains the workflow and gives many examples of point pattern analysis
Functions for modifying sets of functions
It is possible to modify the curve set T_1(r), T_2(r), \dots, T_{s+1}(r)
for the test.
You can choose the interval of distances
[r_{\min}, r_{\max}]
bycrop_curves
.For better visualisation, you can take
T(r)-T_0(r)
byresidual
. HereT_0(r)
is the expectation ofT(r)
under the null hypothesis.
Example data (see references on the help pages of each data set)
-
abide_9002_23
: see help page -
adult_trees
: a point pattern of adult rees -
cgec
: centred government expenditure centralization (GEC) ratios (seegraph.fanova
) -
fallen_trees
: a point pattern of fallen trees -
GDPtax
: GDP per capita with country groups and other covariates -
imageset3
: a simulated set of images -
rimov
: water temperature curves in 365 days of the 36 years -
saplings
: a point pattern of saplings (seeGET.composite
)
The data sets are used to show examples of the functions of the library.
Number of functions
If the number of functions is low, the choice of the measure (or type or depth) playes a role,
as explained in vignette("GET")
(Section 2.4).
Note that the recommended minimum number of simulations for the rank envelope test (Myllymäki et al., 2017) based on a single function in spatial statistics is nsim=2499. When the number of argument values is large, also larger number simulations is needed in order to have a narrow p-interval. The "erl", "cont", "area", "qdir" and "st" global envelope tests and deviation tests can be used with a lower number of simulations, although the Monte Carlo error is obviously larger with a lower number of simulations. For increasing the number of simulations, all the global rank envelopes approach the same curves.
Mrkvička et al. (2017) discussed the number of simulations for tests based on many functions.
Documentation
Myllymäki and Mrkvička (2023) provides description of the package.
The material can also be found in the corresponding vignette, which is available by
starting R and typing library("GET")
and vignette("GET")
.
In the special case of spatial processes (spatial point processes, random sets),
the functions are typically estimators of summary functions. The package supports
the use of the R package spatstat for generating simulations and calculating
estimators of the chosen summary function, but alternatively these can be done by
any other way, thus allowing for any user-specified models/functions.
To see examples of global envelopes for analysing point pattern data,
start R, type library("GET")
and vignette("pointpatterns")
.
Mrkvička and Myllymäki (2023) developed false discovery rate (FDR) envelopes.
Examples can be found by in associated vignette: start R, and type
library("GET")
and vignette("pointpatterns")
.
Mrkvička et al. (2023a) proposed global quantile regression. An example of
global quantile regression is given in the vignette vignette("QuantileRegression")
.
The vignette vignette("HotSpots")
illustrates the methodology proposed by
Mrkvička et al. (2023b) for detecting hotspots on a linear network.
Type citation("GET") to get a full list of references.
Acknowledgements
Mikko Kuronen has made substantial contributions of code. Additional contributions and suggestions from Jiří Dvořák, Pavel Grabarnik, Ute Hahn, Michael Rost and Henri Seijo.
Author(s)
Mari Myllymäki (mari.myllymaki@luke.fi, mari.j.myllymaki@gmail.com) and Tomáš Mrkvička (mrkvicka.toma@gmail.com)
References
Dai, W., Athanasiadis, S., Mrkvička, T. (2021) A new functional clustering method with combined dissimilarity sources and graphical interpretation. Intech open, London, UK. DOI: 10.5772/intechopen.100124
Dvořák, J. and Mrkvička, T. (2022). Graphical tests of independence for general distributions. Computational Statistics 37, 671–699.
Mrkvička, T., Konstantinou, K., Kuronen, M. and Myllymäki, M. (2023a) Global quantile regression. arXiv:2309.04746 [stat.ME]. https://doi.org/10.48550/arXiv.2309.04746
Mrkvička, T., Kraft, S., Blažek, V. and Myllymäki, M. (2023b) Hotspots detection on a linear network with presence of covariates: a case study on road crash data. Available at SSRN: http://dx.doi.org/10.2139/ssrn.4598454
Mrkvička, T., Myllymäki, M. and Hahn, U. (2017) Multiple Monte Carlo testing, with applications in spatial point processes. Statistics & Computing 27(5), 1239-1255. doi: 10.1007/s11222-016-9683-9
Mrkvička, T., Myllymäki, M., Jilek, M. and Hahn, U. (2020) A one-way ANOVA test for functional data with graphical interpretation. Kybernetika 56(3), 432-458. doi: 10.14736/kyb-2020-3-0432
Mrkvička, T., Myllymäki, M., Kuronen, M. and Narisetty, N. N. (2022) New methods for multiple testing in permutation inference for the general linear model. Statistics in Medicine 41(2), 276-297. doi: 10.1002/sim.9236
Mrkvička, T., Myllymäki, M. (2023) False discovery rate envelopes. Statistics and Computing 33, 109. https://doi.org/10.1007/s11222-023-10275-7
Mrkvička, T., Roskovec, T. and Rost, M. (2021) A nonparametric graphical tests of significance in functional GLM. Methodology and Computing in Applied Probability 23, 593-612. doi: 10.1007/s11009-019-09756-y
Mrkvička, T., Soubeyrand, S., Myllymäki, M., Grabarnik, P., and Hahn, U. (2016) Monte Carlo testing in spatial statistics, with applications to spatial residuals. Spatial Statistics 18, Part A, 40-53. doi: http://dx.doi.org/10.1016/j.spasta.2016.04.005
Myllymäki, M., Grabarnik, P., Seijo, H. and Stoyan. D. (2015) Deviation test construction and power comparison for marked spatial point patterns. Spatial Statistics 11, 19-34. doi: 10.1016/j.spasta.2014.11.004
Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017) Global envelope tests for spatial point patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79, 381-404. doi: 10.1111/rssb.12172
Myllymäki, M. and Mrkvička, T. (2023). GET: Global envelopes in R. arXiv:1911.06583 [stat.ME]. https://doi.org/10.48550/arXiv.1911.06583
Myllymäki, M., Kuronen, M. and Mrkvička, T. (2020). Testing global and local dependence of point patterns on covariates in parametric models. Spatial Statistics 42, 100436. doi: 10.1016/j.spasta.2020.100436