generate.sf.data {robflreg}R Documentation

Generate functional data for the scalar-on-function regression model

Description

This function is used to simulate data for the scalar-on-function regression model

Y = \sum_{m=1}^M \int X_m(s) \beta_m(s) ds + \epsilon,

where Y denotes the scalar response, X_m(s) denotes the m-th functional predictor, \beta_m(s) denotes the m-th regression coefficient function, and \epsilon is the error process.

Usage

generate.sf.data(n, n.pred, n.gp, out.p = 0)

Arguments

n

An integer, specifying the number of observations for each variable to be generated.

n.pred

An integer, denoting the number of functional predictors to be generated.

n.gp

An integer, denoting the number of grid points, i.e., a fine grid on the interval [0, 1].

out.p

An integer between 0 and 1, denoting the outlier percentage in the generated data.

Details

In the data generation process, first, the functional predictors are simulated based on the following process:

X_m(s) = \sum_{j=1}^5 \kappa_j v_j(s),

where \kappa_j is a vector generated from a Normal distribution with mean one and variance \sqrt{a} j^{-3/2}, a is a uniformly generated random number between 1 and 4, and

v_j(s) = \sin(j \pi s) - \cos(j \pi s).

The regression coefficient functions are generated from a coefficient space that includes ten different functions such as:

b \sin(2 \pi s)

and

b \cos(2 \pi s),

where b is generated from a uniform distribution between 1 and 3. The error process is generated from the standard normal distribution. If outliers are allowed in the generated data, i.e., out.p > 0, then, the randomply selected n \times out.p of the data are generated in a different way from the aforementioned process. In more detail, if out.p > 0, the regression coefficient functions (possibly different from the previously generated coefficient functions) generated from the coefficient space with b^* (instead of b), where b^* is generated from a uniform distribution between 3 and 5, are used to generate the outlying observations. In addition, in this case, the following process is used to generate functional predictors:

X_m^*(s) = \sum_{j=1}^5 \kappa_j^* v_j^*(s),

where \kappa_j^* is a vector generated from a Normal distribution with mean one and variance \sqrt{a} j^{-1/2} and

v_j^*(s) = 2 \sin(j \pi s) - \cos(j \pi s).

Moreover, the error process is generated from a normal distribution with mean 1 and variance 1. All the functional predictors are generated equally spaced point in the interval [0, 1].

Value

A list object with the following components:

Y

An n \times 1-dimensional matrix containing the observations of simulated scalar response variable.

X

A list with length n.pred. The elements are the n \times n.gp-dimensional matrices containing the observations of simulated functional predictor variables.

f.coef

A list with length n.pred. Each element is a vector and contains the generated regression coefficient function.

out.indx

A vector with length n \times out.p denoting the indices of outlying observations.

Author(s)

Ufuk Beyaztas and Han Lin Shang

Examples

library(fda.usc)
library(fda)
set.seed(2022)
sim.data <- generate.sf.data(n = 400, n.pred = 5, n.gp = 101, out.p = 0.1)
Y <- sim.data$Y
X <- sim.data$X
coeffs <- sim.data$f.coef
out.indx <- sim.data$out.indx
plot(Y[-out.indx,], type = "p", pch = 16, xlab = "Index", ylab = "",
main = "Response", ylim = range(Y))
points(out.indx, Y[out.indx,], type = "p", pch = 16, col = "blue") # Outliers
fX1 <- fdata(X[[1]], argvals = seq(0, 1, length.out = 101))
plot(fX1[-out.indx,], lty = 1, ylab = "", xlab = "Grid point",
     main = expression(X[1](s)), mgp = c(2, 0.5, 0), ylim = range(fX1))
lines(fX1[out.indx,], lty = 1, col = "black") # Leverage points

[Package robflreg version 1.2 Index]