rmfanova {rmfanova} | R Documentation |
Repeated measures functional analysis of variance
Description
The function rmfanova()
calculates the tests based on three test statistics
\mathcal{C}_n
, \mathcal{D}_n
, and \mathcal{E}_n
for the problem of
comparing \ell
-samples of repeated measures for functional data. The tests are based on
five resampling methods, i.e., two permutation and three bootstrap ones. The overall and local
hypotheses are considered.
Usage
rmfanova(
x,
method = "bonferroni",
n_perm = 1000,
n_boot = 1000,
parallel = FALSE,
n_cores = NULL,
multi_gen = FALSE
)
Arguments
x |
a list of length |
method |
the correction method to be used for pairwise comparisons. Options are |
n_perm |
a number of permutation replicates. The default is 1000. |
n_boot |
a number of bootstrap replicates. The default is 1000. |
parallel |
a logical indicating of whether to use parallel computing. The default is |
n_cores |
if |
multi_gen |
a logical indicating of whether to use separate multiple generations of Gaussian processes
for the parametric bootstrap tests. The default is FALSE, which means that the processes will be
generated once in a big matrix. This method is much faster, but for larger |
Details
The function rmfanova()
concerns the tests for the functional repeated measures analysis problem.
The details are presented in Kurylo and Smaga (2023), where in particular, some recommendations for using tests are given.
Here we present only some summary of the problem and its solutions implemented in the package.
We have n
subjects subjected to \ell\geq 2
(possibly) different conditions.
The results of the experiments are functional observations. Let the subjects be represented
by a functional sample consisting of independent stochastic processes Y_1,\dots,Y_n
defined on the
interval [0,\ell]
, which satisfy the following model proposed by Martinez-Camblor and Corral (2011):
Y_j(t)=\mu(t)+e_j(t),\ j=1,\dots,n,\ t\in[0,\ell],
where \mu
is a fixed mean function, and e_j
is a random process with zero mean function.
In this notation, t\in[0,1]
corresponds to the first experimental condition, t\in[1,2]
to the second, and so on. Thus, in this model, we ignore the possible time periods between repetitions
of the experiment, but this does not mean that they do not exist. We are interested in testing the equality
of \ell
mean functions corresponding to experimental conditions; namely, the global null hypothesis is as follows:
\mathcal{H}_0:\mu(t)=\mu(t+1)=\dots=\mu(t+(\ell-1))\ \ \forall t\in[0,1].
For the global null hypothesis \mathcal{H}_0
, the tests given by Martinez-Camblor and Corral (2011)
used the pointwise sum of squares due to the hypothesis:
\mathrm{SSA}_{point}(t)=n\sum_{i=1}^\ell(\bar{Y}_{i\cdot}(t)-\bar{Y}(t))^2,\ t\in[0,1],
where
\bar{Y}_{i\cdot}(t)=n^{-1}\sum_{j=1}^nY_j(t+(i-1)),\ \bar{Y}(t)=N^{-1}\sum_{i=1}^\ell\sum_{j=1}^nY_j(t+(i-1)),
i=1,\dots,\ell
. In the package, it is calculated and drawn by the pointwise_ssa_test_statistic()
function.
The other option is the following pointwise F-type test statistic proposed in Kurylo and Smaga (2023):
F_{point}(t)=\frac{\mathrm{SSA}_{point}(t)/(\ell-1)}{\mathrm{SSR}_{point}(t)/((\ell-1)(n-1))},\ t\in[0,1],
where
\mathrm{SSR}_{point}(t)=\sum_{i=1}^\ell\sum_{j=1}^n(Y_j(t+(i-1))-\bar{Y}_{i\cdot}(t)-\bar{Y}_{\cdot j}(t)+\bar{Y}(t))^2
is the pointwise sum of squares due to residuals, and
\bar{Y}_{\cdot j}(t)=\ell^{-1}\sum_{i=1}^\ell Y_j(t+(i-1)),\ j=1,\dots,n.
F_{point}
is calculated and drawn by the pointwise_f_test_statistic()
function.
To obtain global test statistics for \mathcal{H}_0
, Martinez-Camblor and Corral (2011) proposed the
following test statistic:
\mathcal{C}_n(\ell)=\int_0^1\mathrm{SSA}_{point}(t)dt.
On the other hand, Kurylo and Smaga (2023) proposed the following two test statistics:
\mathcal{D}_n(\ell)=\int_0^1F_{point}(t)dt,\quad\mathcal{E}_n(\ell)=\sup\limits_{t\in[0,1]}F_{point}(t).
To construct the tests, five resampling strategies are proposed by Kurylo and Smaga (2023). For details, we refer
to this paper. Here we just note the two permutation tests and three bootstrap tests are denoted by P1, P2, B1, B2,
and B3 in the output of the summary.rmfanova()
function.
When \ell>2
, by rejecting the global null hypothesis \mathcal{H}_0
, we determine the presence of significant differences
in the mean functions corresponding to the experimental conditions. However, we do not know which conditions are
significantly different and which are not. To solve this problem, one needs to perform a post hoc analysis.
More precisely, we would like to test the family of hypotheses:
\left\{\begin{array}{l}
\mathcal{H}_0^{rs}:\mu(t+(r-1))=\mu(t+(s-1))\ \forall t\in[0,1],\\
\mathcal{H}_1^{rs}:\mu(t+(r-1))\neq\mu(t+(s-1))\ \text{for some}\ t\in[0,1],\\
\end{array}\right.
for r,s=1,\dots,\ell
, r\neq s
. These hypotheses are also named pairwise comparisons.
To test this family of local hypotheses, we propose the following procedure:
1. Test each of the hypotheses \mathcal{H}_0^{rs}
using the data for the r
-th and s
-th objects,
i.e., Y_1(t),\dots,Y_n(t)
for t\in[r-1,r]
and t\in[s-1,s]
respectively, and the chosen test
from those presented above. Let p_{rs}
denote the p
-values obtained.
2. Make a final decision using the Bonferroni method, i.e., reject \mathcal{H}_0^{rs}
if
p_{rs}^{Bonf}\leq \alpha
, where p_{rs}^{Bonf}=m\cdot p_{rs}
are the corrected p
-values,
\alpha
is the significance level and m
is the number of null hypotheses considered.
In the paper Kurylo and Smaga (2023), the Bonferroni method was used only. However, in the package,
there is a possibility to use other correction methods, which are available in the vector p.adjust.methods
.
The results of testing the global and local hypotheses are given separately in the output of the
summary.rmfanova()
function for the convenience of the user.
Value
A list of class rmfanova
containing the following 7 components:
n |
a number |
p |
a number |
l |
a number |
method |
an argument |
test_stat |
values of the test statistics |
p_values |
p-values for the global null hypothesis. |
p_values_pc |
p-values of the pairwise comparisons. |
References
Martinez-Camblor P., Corral N. (2011) Repeated Measures Analysis for Functional Data. Computational Statistics & Data Analysis 55, 3244–3256.
Kurylo K., Smaga L. (2023) Functional repeated measures analysis of variance and its application. Preprint https://arxiv.org/abs/2306.03883
Ramsay J.O., Silverman B.W. (2005) Functional Data Analysis, 2nd Edition. New York: Springer.
Zhang J.T. (2013) Analysis of Variance for Functional Data. London: Chapman & Hall.
Examples
# Some of the examples may run some time.
# preparation of the DTI data set, for details see Kurylo and Smaga (2023)
library(refund)
data(DTI)
# MS patients
DTI_ms <- DTI[DTI$case == 1, ]
miss_data <- c()
for (i in 1:340) if (any(is.na(DTI_ms$cca[i, ]))) miss_data <- c(miss_data, i)
DTI_ms <- DTI_ms[-miss_data, ]
DTI_ms_2 <- DTI_ms[DTI_ms$Nscans == 4, ]
xx <- vector("list", 4)
for (i in 1:4) {
xx[[i]] <- DTI_ms_2$cca[DTI_ms_2$visit == i, ]
}
xx[[1]] <- xx[[1]][-14, ]
xx[[3]] <- xx[[3]][-14, ]
yy <- xx
for (i in seq_len(4)) yy[[i]] <- yy[[i]][1:17, ]
# data trajectories for four visits
oldpar <- par(mfrow = c(1, 4), mar = c(4, 4, 4, 0.1))
matplot(t(yy[[1]]), type = "l", col = 1, lty = 1, xlab = "t", ylab = "FA",
main = "Visit 1", xaxt = "n", ylim = c(0.29, 0.73))
axis(1, c(1, 15, 30, 45, 60, 75, 93), labels = c(1, 15, 30, 45, 60, 75, 93))
matplot(t(yy[[2]]), type = "l", col = 1, lty = 1, xlab = "t", ylab = "FA",
main = "Visit 2", xaxt = "n", ylim = c(0.29, 0.73))
axis(1, c(1, 15, 30, 45, 60, 75, 93), labels = c(1, 15, 30, 45, 60, 75, 93))
matplot(t(yy[[3]]), type = "l", col = 1, lty = 1, xlab = "t", ylab = "FA",
main = "Visit 3", xaxt = "n", ylim = c(0.29, 0.73))
axis(1, c(1, 15, 30, 45, 60, 75, 93), labels = c(1, 15, 30, 45, 60, 75, 93))
matplot(t(yy[[4]]), type = "l", col = 1, lty = 1, xlab = "t", ylab = "FA",
main = "Visit 4", xaxt = "n", ylim = c(0.29, 0.73))
axis(1, c(1, 15, 30, 45, 60, 75, 93), labels = c(1, 15, 30, 45, 60, 75, 93))
par(oldpar)
# sample mean functions
oldpar <- par(mfrow = c(1, 1), mar = c(4, 4, 2, 0.1))
pointwise_sample_mean_fun(yy, values = FALSE,
col = 1:4, xlab = "t", ylab = "FA", xaxt = "n")
axis(1, c(1, 15, 30, 45, 60, 75, 93), labels = c(1, 15, 30, 45, 60, 75, 93))
legend(x = 36, y = 0.64, legend = 1:4, lty = 1, col = 1:4, title = "Visit")
par(oldpar)
# pointwise SSA and F-type test statistics
oldpar <- par(mfrow = c(1, 2), mar = c(4, 2, 2, 0.1))
pointwise_ssa_test_statistic(yy, xlab = "t", xaxt = "n")
axis(1, c(1, 15, 30, 45, 60, 75, 93), labels = c(1, 15, 30, 45, 60, 75, 93))
pointwise_f_test_statistic(yy, xlab = "t", xaxt = "n")
axis(1, c(1, 15, 30, 45, 60, 75, 93), labels = c(1, 15, 30, 45, 60, 75, 93))
par(oldpar)
# testing without parallel computing and multiple generation of Gaussian processes
res <- rmfanova(yy)
summary(res, digits = 3)
# testing without parallel computing and with multiple generation of Gaussian processes
res <- rmfanova(yy, multi_gen = TRUE)
summary(res, digits = 3)
# testing with parallel computing and without multiple generation of Gaussian processes
res <- rmfanova(yy, parallel = TRUE, n_cores = 2)
summary(res, digits = 3)
# testing with parallel computing and with multiple generation of Gaussian processes
res <- rmfanova(yy, parallel = TRUE, multi_gen = TRUE, n_cores = 2)
summary(res, digits = 3)