generate.ff.data {robflreg} | R Documentation |
Generate functional data for the function-on-function regression model
Description
This function provides a unified simulation structure for the function-on-function regression model
Y(t) = \sum_{m=1}^M \int X_m(s) \beta_m(s,t) ds + \epsilon(t),
where Y(t)
denotes the functional response, X_m(s)
denotes the m
-th functional predictor, \beta_m(s,t)
denotes the m
-th bivariate regression coefficient function, and \epsilon(t)
is the error function.
Usage
generate.ff.data(n.pred, n.curve, n.gp, out.p = 0)
Arguments
n.pred |
An integer, denoting the number of functional predictors to be generated. |
n.curve |
An integer, specifying the number of observations for each functional variable to be generated. |
n.gp |
An integer, denoting the number of grid points, i.e., a fine grid on the interval [0, 1]. |
out.p |
An integer between 0 and 1, denoting the outlier percentage in the generated data. |
Details
In the data generation process, first, the functional predictors are simulated based on the following process:
X_m(s) = \sum_{j=1}^5 \kappa_j v_j(s),
where \kappa_j
is a vector generated from a Normal distribution with mean one and variance \sqrt{a} j^{-1/2}
, a
is a uniformly generated random number between 1 and 4, and
v_j(s) = \sin(j \pi s) - \cos(j \pi s).
The bivariate regression coefficient functions are generated from a coefficient space that includes ten different functions such as:
b \sin(2 \pi s) \sin(\pi t)
and
b e^{-3 (s - 0.5)^2} e^{-4 (t - 1)^2},
where b
is generated from a uniform distribution between 1 and 3. The error function \epsilon(t)
, on the other hand, is generated from the Ornstein-Uhlenbeck process:
\epsilon(t) = l + [\epsilon_0(t) - l] e^{-\theta t} + \sigma \int_0^t e^{-\theta (t-u)} d W_u,
where l, \theta > 0, \sigma > 0
are constants, \epsilon_0(t)
is the initial value of \epsilon(t)
taken from W_u
, and
W_u
is the Wiener process. If outliers are allowed in the generated data, i.e., out.p > 0
, then, the randomly selected n.curve \times out.p
of the data are generated in a different way from the aforementioned process. In more detail, if out.p > 0
, the bivariate regression coefficient functions (possibly different from the previously generated coefficient functions) generated from the coefficient space with b^*
(instead of b
), where b^*
is generated from a uniform distribution between 1 and 2, are used to generate the outlying observations. In addition, in this case, the following process is used to generate functional predictors:
X_m^*(s) = \sum_{j=1}^5 \kappa_j^* v_j^*(s),
where \kappa_j^*
is a vector generated from a Normal distribution with mean one and variance \sqrt{a} j^{-3/2}
and
v_j^*(s) = 2 \sin(j \pi s) - \cos(j \pi s).
All the functions are generated equally spaced point in the interval [0, 1]
.
Value
A list object with the following components:
Y |
An |
X |
A list with length n.pred. The elements are the |
f.coef |
A list with length n.pred. Each element is a matrix and contains the generated bivariate regression coefficient function. |
out.indx |
A vector with length |
Author(s)
Ufuk Beyaztas and Han Lin Shang
References
E. Garcia-Portugues and J. Alvarez-Liebana J and G. Alvarez-Perez G and W. Gonzalez-Manteiga W (2021) "A goodness-of-fit test for the functional linear model with functional response", Scandinavian Journal of Statistics, 48(2), 502-528.
Examples
library(fda)
library(fda.usc)
set.seed(2022)
sim.data <- generate.ff.data(n.pred = 5, n.curve = 200, n.gp = 101, out.p = 0.1)
Y <- sim.data$Y
X <- sim.data$X
coeffs <- sim.data$f.coef
out.indx <- sim.data$out.indx
fY <- fdata(Y, argvals = seq(0, 1, length.out = 101))
plot(fY[-out.indx,], lty = 1, ylab = "", xlab = "Grid point",
main = "Response", mgp = c(2, 0.5, 0), ylim = range(fY))
lines(fY[out.indx,], lty = 1, col = "black") # Outlying functions