gmtFD {gmtFD} | R Documentation |
General multiple tests for univariate and multivariate functional data
Description
The function gmtFD()
calculates the statistical tests based on globalizing and supremum
pointwise Hotelling's T^2
-test statistics (GPH and SPH) for the global null hypothesis and
multiple local null hypotheses. Respective p
-values are obtained by a parametric bootstrap strategy.
Usage
gmtFD(
x,
h,
blocks_contrasts,
n_boot = 1000,
alpha = 0.05,
parallel = FALSE,
n_cores = NULL,
multi_gen = FALSE
)
Arguments
x |
a list of |
h |
contrast matrix. For contrast matrices based on Dunnett’s and Tukey’s contrasts,
it can be created by the |
blocks_contrasts |
a vector with blocks of contrasts labels. The integer labels (from 1 to a number
of blocks of contrasts) should be used in non-decreasing order. For univariate case ( |
n_boot |
number of bootstrap samples. |
alpha |
significance level. |
parallel |
a logical indicating whether to use parallelization. |
n_cores |
if |
multi_gen |
a logical indicating of whether to use separate multiple generations of Gaussian processes
for the parametric bootstrap tests. The default is FALSE, which means that the processes will be
generated once in a big matrix. This method is much faster, but for larger |
Details
The function gmtFD()
concerns the tests for the heteroscedastic
contrast testing problem for univariate and multivariate functional data. The details are
presented in Munko et al. (2023, 2024), but here we present some summary of the problem
and its solutions implemented in the package.
Suppose we have k
independent functional samples \mathbf{x}_{i1},\dots,\mathbf{x}_{in_i}
,
which consist of independent and identically distributed p
-dimensional stochastic processes defined
on interval \mathcal{T}
with mean function vector \boldsymbol{\eta}_i
and covariance function
\boldsymbol{\Gamma}_i
for each i\in\{1,\dots,k\}
. Note that the covariance functions
of the different groups may differ from each other, i.e., heteroscedasticity is explicitly allowed.
We consider the null and alternative hypothesis
\mathcal H_0: \mathbf{H}\boldsymbol{\eta}(t) =\mathbf{0}_r \text{ for all }
t\in \mathcal{T} \quad \text{vs.} \quad \mathcal H_1: \mathbf{H}\boldsymbol{\eta}(t) \neq
\mathbf{0}_r \text{ for some } t\in \mathcal{T},
where \mathbf{H} \in \mathbb{R}^{r \times pk}
denotes a known contrast matrix, i.e., \mathbf{H}\mathbf{1}_{pk} = \mathbf{0}_r
, and
\boldsymbol{\eta} := (\boldsymbol{\eta}_1^{\top},\dots,\boldsymbol{\eta}_k^{\top})^{\top}
is the vector of the mean functions. The formulation of this testing framework is very general
and contains many special cases like the analysis of variance for univariate and multivariate functional data
(FANOVA and FMANOVA) problems.
For univariate functional data (p=1
), we may choose \mathbf{H} = \mathbf{P}_k
for the one-way FANOVA problem to test the null hypothesis of no main effect, where
\mathbf{P}_k:=\mathbf{I}_k-\mathbf{J}_k/k
with \mathbf{I}_k \in\mathbb{R}^{k\times k}
denoting the unit matrix and \mathbf{J}_k := \mathbf{1}_k\mathbf{1}_k^{\top} \in\mathbb{R}^{k\times k}
denoting the matrix of ones. However, there are different possible choices of the
contrast matrix \mathbf{H}
which lead to this global null hypothesis.
Many-to-one comparisons can be considered by choosing Dunnett's contrast matrix
\mathbf{H} = [-\mathbf{1}_{k-1}, \mathbf{I}_{k-1}]
, where the mean
functions \eta_2,\dots,\eta_k
are compared to the mean function \eta_1
of the first group regarding the different contrasts. To compare all pairs
of mean functions \eta_{i_1},\eta_{i_2}, i_1,i_2 \in\{1,\dots,k\}
with
i_1 \neq i_2
, the Tukey's contrast matrix:
\mathbf{H} = \begin{bmatrix}
-1 & 1 & 0 & 0 & \cdots & \cdots & 0 \\
-1 & 0 & 1 & 0 &\cdots & \cdots & 0 \\
\vdots & \vdots &\vdots & \vdots & \ddots & \vdots & \vdots \\
-1 & 0 & 0 & 0& \cdots & \cdots & 1\\
0 & -1 & 1 & 0& \cdots & \cdots & 0 \\
0 & -1 & 0 & 1& \cdots & \cdots & 0 \\
\vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\
0 & 0 & 0 & 0 & \cdots & -1 & 1
\end{bmatrix} \in \mathbb{R}^{k(k-1)/2 \times k}
can be used.
For multivariate functional data (p>1
), we extend the matrices for p=1
given above
as follows: \mathbf{H}=\mathbf{H}_1\otimes \mathbf{I}_p
, where \mathbf{H}_1
is the
contrast matrix for the univariate case. For more examples of contrast matrices in different
settings, see Merle et al. (2023, 2024).
For this testing problem, we consider the pointwise Hotelling's T^2
-test statistic
\mathrm{PH}_{n,\mathbf{H}}(t):= n(\mathbf{H}\boldsymbol{\widehat\eta}(t))^{\top}
(\mathbf{H}\mathbf{\widehat \Gamma}(t,t) \mathbf{H}^{\top})^+
\mathbf{H} \boldsymbol{\widehat\eta}(t)
for all t \in\mathcal{T}
, where
\boldsymbol{\widehat\eta} := (\boldsymbol{\widehat\eta}_1^{\top},\dots,\boldsymbol{\widehat\eta}_k^{\top})^{\top}
denotes the vector of all mean function estimators, \mathbf{A}^+
denotes the
Moore-Penrose inverse of the matrix \mathbf{A}
, and
\boldsymbol{\widehat{\Gamma}}(t,s):= \mathrm{diag}\left( \frac{n}{n_1}\widehat{\boldsymbol{\Gamma}}_1(t,s),
\ldots,\frac{n}{n_k}\widehat{\boldsymbol{\Gamma}}_k(t,s)\right),
n=n_1+\dots+n_k
,
\widehat{\boldsymbol{\Gamma}}_i(t,s)
is the sample covariance function for the i
-th group,
i\in\{1,\dots,k\}
. Based on this pointwise Hotelling's T^2
-test statistic, we construct the globalizing
and supremum of pointwise Hotelling's T^2
-test (GPH and SPH) statistics by integrating and supremum over
the pointwise Hotelling's T^2
-test statistic, that is
I_{n}(\mathbf{H}) := \int_{\mathcal{T}} \mathrm{PH}_{n,\mathbf{H}}(t) \,\mathrm{ d }t,\ \
T_{n}(\mathbf{H}) := \mathrm{sup}_{t\in\mathcal{T}} \mathrm{PH}_{n,\mathbf{H}}(t).
We consider the parametric bootstrap test based on these test statistics. However, for better post hoc
analysis, we also consider the multiple contrast testing procedures. The main idea of multiple contrast
tests is to split up the global null hypothesis with matrix
\mathbf{H}= [\mathbf{H}_1^{\top}, \dots, \mathbf{H}_R^{\top}]^{\top}
into R
matrices
\mathbf{H}_{\ell}\in\mathbb{R}^{r_{\ell}\times k}
with \mathrm{rank}(\mathbf{H}_{\ell})\geq 1
,
\ell\in\{1,\dots,R\}
, where R,r_{\ell}\in\{1,\dots,r\}
, and \sum_{\ell=1}^Rr_{\ell}=r
.
This leads to the multiple testing problem with null hypotheses
\mathcal H_{0,{\ell}} : \; \mathbf{H}_{\ell} \boldsymbol{\eta}(t) =
\mathbf{0}_{r_{\ell}} \;\text{ for all }t\in\mathcal{T}, \text{for }\ell\in \{1,\ldots,R\}.
To verify this family of null hypotheses, we adopt two approaches. First, we simply apply
the above test to each hypothesis \mathcal H_{0,{\ell}}
, and the resulting p
-values
are then corrected by the Bonferroni's method. However, this approach, denoted in the package as
GPH and SPH, may give conservative test and loss of power. Thus, we also consider the test adopting the
idea for the construction of simultaneous confidence bands proposed by Buhlmann (1998).
This test is denoted by mGPH and mSPH in the package and is a more powerful solution than the GPH and SPH
procedures, which was shown in Munko et al. (2023, 2024).
Note that the value of the test statistics for the mGPH and mSPH tests for global hypotheses are equal to
\max_{\ell\in\{1,\ldots,R\}}\frac{I_{n}(\mathbf{H}_{\ell})}{q_{GPH,\ell,\widetilde{\beta}}^{\mathcal{P}}}\text{ and }
\max_{\ell\in\{1,\ldots,R\}}\frac{T_{n}(\mathbf{H}_{\ell})}{q_{SPH,\ell,\widetilde{\beta}}^{\mathcal{P}}},
respectively, where q_{GPH,\ell,\widetilde{\beta}}^{\mathcal{P}}
and q_{SPH,\ell,\widetilde{\beta}}^{\mathcal{P}}
are the quantiles calculated using the adaptation of the method by Buhlmann (1998).
The critical value for it is always 1.
Please have a look at a summary function designed for this package. It can be used
to simplify the output of gmtFD()
function.
Value
A list of class multifmanova
containing the following components:
res_global |
a data frame containing the results for testing the global null hypothesis,
i.e., test statistics and |
res_multi |
all results of multiple contrasts tests for particular hypothesis in
a contrast matrix |
k |
a number of groups. |
p |
a number of variables. |
ntp |
a number of design time points. |
n |
a vector of sample sizes. |
h |
an argument |
n_boot |
an argument |
alpha |
an argument |
multi_gen |
an argument |
References
Buhlmann P. (1998) Sieve bootstrap for smoothing in nonstationary time series. Annals of Statistics 26, 48-83.
Dunnett C. (1955) A multiple comparison procedure for comparing several treatments with a control. Journal of the American Statistical Association 50, 1096-1121.
Munko M., Ditzhaus M., Pauly M., Smaga L., Zhang J.T. (2023) General multiple tests for functional data. Preprint https://arxiv.org/abs/2306.15259
Munko M., Ditzhaus M., Pauly M., Smaga L. (2024) Multiple comparison procedures for simultaneous inference in functional MANOVA. Preprint https://arxiv.org/abs/2406.01242
Tukey J.W. (1953) The problem of multiple comparisons. Princeton University.
Examples
# Some of the examples may run some time.
# Canadian weather data set
# There are three samples of mean temperature and precipitations for
# fifteen weather stations in Western Canada, another fifteen in Eastern Canada,
# and the remaining five in Northern Canada.
# one functional variable - temperature
library(fda)
data_set_t <- t(CanadianWeather$dailyAv[,, "Temperature.C"])
# number of samples
k <- 3
# number of variables
p <- 1
# preparing data set
gr_label <- rep(c(1, 2, 3), c(15, 15, 5))
data_set <- list(list(data_set_t[gr_label == 1, ]),
list(data_set_t[gr_label == 2, ]),
list(data_set_t[gr_label == 3, ]))
# trajectories of mean temperature and precipitation
oldpar <- par(mar = c(4, 4, 2, 0.1))
matplot(t(data_set_t), type = "l", col = gr_label, lty = 1,
xlab = "Day", ylab = "Temperature (C)",
main = "Canadian weather data set")
legend("bottom", legend = c("Western Canada", "Eastern Canada", "Northern Canada"),
col = 1:3, lty = 1)
par(oldpar)
# Tukey's contrast matrix
h_tukey <- GFDmcv::contr_mat(k, type = "Tukey")
h_tukey_m <- kronecker(h_tukey, diag(p))
# vector of blocks of contrasts labels
blocks_contrasts <- rep(1:(nrow(h_tukey_m) / p), each = p)
# testing without parallel computing
res <- gmtFD(data_set, h_tukey_m, blocks_contrasts)
summary(res, digits = 3)
# plots for pointwise Hotelling's T^2-test statistics
oldpar <- par(mfrow = c(2, 2), mar = c(4, 2, 2, 0.1))
plot(ph_test_statistic(data_set, h_tukey_m), type = "l",
ylim = c(0, max(ph_test_statistic(data_set, h_tukey_m))),
main = "Global hypothesis", xlab = "Day")
plot(ph_test_statistic(data_set, matrix(h_tukey_m[1, ], 1)), type = "l",
ylim = c(0, max(ph_test_statistic(data_set, h_tukey_m))),
main = "Contrast 1", xlab = "Day")
plot(ph_test_statistic(data_set, matrix(h_tukey_m[2, ], 1)), type = "l",
ylim = c(0, max(ph_test_statistic(data_set, h_tukey_m))),
main = "Contrast 2", xlab = "Day")
plot(ph_test_statistic(data_set, matrix(h_tukey_m[3, ], 1)), type = "l",
ylim = c(0, max(ph_test_statistic(data_set, h_tukey_m))),
main = "Contrast 3", xlab = "Day")
par(oldpar)
# Dunnett's contrast matrix
h_dunnett <- GFDmcv::contr_mat(k, type = "Dunnett")
h_dunnett_m <- kronecker(h_dunnett, diag(p))
# vector of blocks of contrasts labels
blocks_contrasts <- rep(1:(nrow(h_dunnett_m) / p), each = p)
# testing without parallel computing
res <- gmtFD(data_set, h_dunnett_m, blocks_contrasts)
summary(res, digits = 3)
# plots for pointwise Hotelling's T^2-test statistics
oldpar <- par(mfrow = c(3, 1), mar = c(4, 2, 2, 0.1))
plot(ph_test_statistic(data_set, h_dunnett_m), type = "l",
ylim = c(0, max(ph_test_statistic(data_set, h_dunnett_m))),
main = "Global hypothesis", xlab = "Day")
plot(ph_test_statistic(data_set, matrix(h_dunnett_m[1, ], 1)), type = "l",
ylim = c(0, max(ph_test_statistic(data_set, h_dunnett_m))),
main = "Contrast 1", xlab = "Day")
plot(ph_test_statistic(data_set, matrix(h_dunnett_m[2, ], 1)), type = "l",
ylim = c(0, max(ph_test_statistic(data_set, h_dunnett_m))),
main = "Contrast 2", xlab = "Day")
par(oldpar)
# two functional variables - temperature and precipitation
library(fda)
data_set_t <- t(CanadianWeather$dailyAv[,, "Temperature.C"])
data_set_p <- t(CanadianWeather$dailyAv[,, "Precipitation.mm"])
# number of samples
k <- 3
# number of variables
p <- 2
# preparing data set
gr_label <- rep(c(1, 2, 3), c(15, 15, 5))
data_set <- list(list(data_set_t[gr_label == 1, ], data_set_p[gr_label == 1, ]),
list(data_set_t[gr_label == 2, ], data_set_p[gr_label == 2, ]),
list(data_set_t[gr_label == 3, ], data_set_p[gr_label == 3, ]))
# trajectories of mean temperature and precipitation
oldpar <- par(mfrow = c(1, 2), mar = c(4, 4, 2, 0.1))
matplot(t(data_set_t), type = "l", col = gr_label, lty = 1,
xlab = "Day", ylab = "Temperature (C)",
main = "Canadian weather data set")
legend("bottom", legend = c("Western Canada", "Eastern Canada", "Northern Canada"),
col = 1:3, lty = 1)
matplot(t(data_set_p), type = "l", col = gr_label, lty = 1,
xlab = "Day", ylab = "Precipitation (mm)",
main = "Canadian weather data set")
legend("topleft", legend = c("Western Canada", "Eastern Canada", "Northern Canada"),
col = 1:3, lty = 1)
par(oldpar)
# Tukey's contrast matrix
h_tukey <- GFDmcv::contr_mat(k, type = "Tukey")
h_tukey_m <- kronecker(h_tukey, diag(p))
# vector of blocks of contrasts labels
blocks_contrasts <- rep(1:(nrow(h_tukey_m) / p), each = p)
# testing without parallel computing
res <- gmtFD(data_set, h_tukey_m, blocks_contrasts)
summary(res, digits = 3)
# plots for pointwise Hotelling's T^2-test statistics
oldpar <- par(mfrow = c(2, 2), mar = c(4, 2, 2, 0.1))
plot(ph_test_statistic(data_set, h_tukey_m), type = "l",
ylim = c(0, max(ph_test_statistic(data_set, h_tukey_m))),
main = "Global hypothesis", xlab = "Day")
plot(ph_test_statistic(data_set, h_tukey_m[1:2, ]), type = "l",
ylim = c(0, max(ph_test_statistic(data_set, h_tukey_m))),
main = "Contrast 1", xlab = "Day")
plot(ph_test_statistic(data_set, h_tukey_m[3:4, ]), type = "l",
ylim = c(0, max(ph_test_statistic(data_set, h_tukey_m))),
main = "Contrast 2", xlab = "Day")
plot(ph_test_statistic(data_set, h_tukey_m[5:6, ]), type = "l",
ylim = c(0, max(ph_test_statistic(data_set, h_tukey_m))),
main = "Contrast 3", xlab = "Day")
par(oldpar)
# Dunnett's contrast matrix
h_dunnett <- GFDmcv::contr_mat(k, type = "Dunnett")
h_dunnett_m <- kronecker(h_dunnett, diag(p))
# vector of blocks of contrasts labels
blocks_contrasts <- rep(1:(nrow(h_dunnett_m) / p), each = p)
# testing without parallel computing
res <- gmtFD(data_set, h_dunnett_m, blocks_contrasts)
summary(res, digits = 3)
# plots for pointwise Hotelling's T^2-test statistics
oldpar <- par(mfrow = c(3, 1), mar = c(4, 2, 2, 0.1))
plot(ph_test_statistic(data_set, h_dunnett_m), type = "l",
ylim = c(0, max(ph_test_statistic(data_set, h_dunnett_m))),
main = "Global hypothesis", xlab = "Day")
plot(ph_test_statistic(data_set, matrix(h_dunnett_m[1, ], 1)), type = "l",
ylim = c(0, max(ph_test_statistic(data_set, h_dunnett_m))),
main = "Contrast 1", xlab = "Day")
plot(ph_test_statistic(data_set, matrix(h_dunnett_m[2, ], 1)), type = "l",
ylim = c(0, max(ph_test_statistic(data_set, h_dunnett_m))),
main = "Contrast 2", xlab = "Day")
par(oldpar)
# testing with parallel computing
library(doParallel)
blocks_contrasts <- rep(1:(nrow(h_tukey_m) / p), each = p)
res <- gmtFD(data_set, h_tukey_m, blocks_contrasts, parallel = TRUE, n_cores = 2)
summary(res, digits = 3)