generate.sf.data {robflreg} | R Documentation |
Generate functional data for the scalar-on-function regression model
Description
This function is used to simulate data for the scalar-on-function regression model
Y = \sum_{m=1}^M \int X_m(s) \beta_m(s) ds + \epsilon,
where Y
denotes the scalar response, X_m(s)
denotes the m
-th functional predictor, \beta_m(s)
denotes the m
-th regression coefficient function, and \epsilon
is the error process.
Usage
generate.sf.data(n, n.pred, n.gp, out.p = 0)
Arguments
n |
An integer, specifying the number of observations for each variable to be generated. |
n.pred |
An integer, denoting the number of functional predictors to be generated. |
n.gp |
An integer, denoting the number of grid points, i.e., a fine grid on the interval [0, 1]. |
out.p |
An integer between 0 and 1, denoting the outlier percentage in the generated data. |
Details
In the data generation process, first, the functional predictors are simulated based on the following process:
X_m(s) = \sum_{j=1}^5 \kappa_j v_j(s),
where \kappa_j
is a vector generated from a Normal distribution with mean one and variance \sqrt{a} j^{-3/2}
, a
is a uniformly generated random number between 1 and 4, and
v_j(s) = \sin(j \pi s) - \cos(j \pi s).
The regression coefficient functions are generated from a coefficient space that includes ten different functions such as:
b \sin(2 \pi s)
and
b \cos(2 \pi s),
where b
is generated from a uniform distribution between 1 and 3. The error process is generated from the standard normal distribution. If outliers are allowed in the generated data, i.e., out.p > 0
, then, the randomply selected n \times out.p
of the data are generated in a different way from the aforementioned process. In more detail, if out.p > 0
, the regression coefficient functions (possibly different from the previously generated coefficient functions) generated from the coefficient space with b^*
(instead of b
), where b^*
is generated from a uniform distribution between 3 and 5, are used to generate the outlying observations. In addition, in this case, the following process is used to generate functional predictors:
X_m^*(s) = \sum_{j=1}^5 \kappa_j^* v_j^*(s),
where \kappa_j^*
is a vector generated from a Normal distribution with mean one and variance \sqrt{a} j^{-1/2}
and
v_j^*(s) = 2 \sin(j \pi s) - \cos(j \pi s).
Moreover, the error process is generated from a normal distribution with mean 1 and variance 1. All the functional predictors are generated equally spaced point in the interval [0, 1]
.
Value
A list object with the following components:
Y |
An |
X |
A list with length n.pred. The elements are the |
f.coef |
A list with length n.pred. Each element is a vector and contains the generated regression coefficient function. |
out.indx |
A vector with length |
Author(s)
Ufuk Beyaztas and Han Lin Shang
Examples
library(fda.usc)
library(fda)
set.seed(2022)
sim.data <- generate.sf.data(n = 400, n.pred = 5, n.gp = 101, out.p = 0.1)
Y <- sim.data$Y
X <- sim.data$X
coeffs <- sim.data$f.coef
out.indx <- sim.data$out.indx
plot(Y[-out.indx,], type = "p", pch = 16, xlab = "Index", ylab = "",
main = "Response", ylim = range(Y))
points(out.indx, Y[out.indx,], type = "p", pch = 16, col = "blue") # Outliers
fX1 <- fdata(X[[1]], argvals = seq(0, 1, length.out = 101))
plot(fX1[-out.indx,], lty = 1, ylab = "", xlab = "Grid point",
main = expression(X[1](s)), mgp = c(2, 0.5, 0), ylim = range(fX1))
lines(fX1[out.indx,], lty = 1, col = "black") # Leverage points