R: Generate functional data for the scalar-on-function...

generate.sf.data {robflreg}

R Documentation

Generate functional data for the scalar-on-function regression model

Description

This function is used to simulate data for the scalar-on-function regression model

Y = \sum_{m=1}^M \int X_m(s) \beta_m(s) ds + \epsilon,

where Y denotes the scalar response, X_m(s) denotes the m-th functional predictor, \beta_m(s) denotes the m-th regression coefficient function, and \epsilon is the error process.

Usage

generate.sf.data(n, n.pred, n.gp, out.p = 0)

Arguments

`n`	An integer, specifying the number of observations for each variable to be generated.
`n.pred`	An integer, denoting the number of functional predictors to be generated.
`n.gp`	An integer, denoting the number of grid points, i.e., a fine grid on the interval [0, 1].
`out.p`	An integer between 0 and 1, denoting the outlier percentage in the generated data.

Details

In the data generation process, first, the functional predictors are simulated based on the following process:

X_m(s) = \sum_{j=1}^5 \kappa_j v_j(s),

where \kappa_j is a vector generated from a Normal distribution with mean one and variance \sqrt{a} j^{-3/2}, a is a uniformly generated random number between 1 and 4, and

v_j(s) = \sin(j \pi s) - \cos(j \pi s).

The regression coefficient functions are generated from a coefficient space that includes ten different functions such as:

b \sin(2 \pi s)

and

b \cos(2 \pi s),

where b is generated from a uniform distribution between 1 and 3. The error process is generated from the standard normal distribution. If outliers are allowed in the generated data, i.e., out.p > 0, then, the randomply selected n \times out.p of the data are generated in a different way from the aforementioned process. In more detail, if out.p > 0, the regression coefficient functions (possibly different from the previously generated coefficient functions) generated from the coefficient space with b^* (instead of b), where b^* is generated from a uniform distribution between 3 and 5, are used to generate the outlying observations. In addition, in this case, the following process is used to generate functional predictors:

X_m^*(s) = \sum_{j=1}^5 \kappa_j^* v_j^*(s),

where \kappa_j^* is a vector generated from a Normal distribution with mean one and variance \sqrt{a} j^{-1/2} and

v_j^*(s) = 2 \sin(j \pi s) - \cos(j \pi s).

Moreover, the error process is generated from a normal distribution with mean 1 and variance 1. All the functional predictors are generated equally spaced point in the interval [0, 1].

Value

A list object with the following components:

`Y`	An `n \times 1`-dimensional matrix containing the observations of simulated scalar response variable.
`X`	A list with length n.pred. The elements are the `n \times n.gp`-dimensional matrices containing the observations of simulated functional predictor variables.
`f.coef`	A list with length n.pred. Each element is a vector and contains the generated regression coefficient function.
`out.indx`	A vector with length `n \times out.p` denoting the indices of outlying observations.

Author(s)

Ufuk Beyaztas and Han Lin Shang

Examples

library(fda.usc)
library(fda)
set.seed(2022)
sim.data <- generate.sf.data(n = 400, n.pred = 5, n.gp = 101, out.p = 0.1)
Y <- sim.data$Y
X <- sim.data$X
coeffs <- sim.data$f.coef
out.indx <- sim.data$out.indx
plot(Y[-out.indx,], type = "p", pch = 16, xlab = "Index", ylab = "",
main = "Response", ylim = range(Y))
points(out.indx, Y[out.indx,], type = "p", pch = 16, col = "blue") # Outliers
fX1 <- fdata(X[[1]], argvals = seq(0, 1, length.out = 101))
plot(fX1[-out.indx,], lty = 1, ylab = "", xlab = "Grid point",
     main = expression(X[1](s)), mgp = c(2, 0.5, 0), ylim = range(fX1))
lines(fX1[out.indx,], lty = 1, col = "black") # Leverage points

[Package robflreg version 1.2 Index]