flm_test {goffda}R Documentation

Goodness-of-fit test for functional linear models

Description

Goodness-of-fit test of a functional linear model with functional response Y \in L^2([c, d]) and functional predictor X \in L^2([a, b]), where L^2([a, b]) is the Hilbert space of square-integrable functions in [a, b].

The goodness-of-fit test checks the linearity of the regression model m:L^2([a, b])\rightarrow L^2([c, d]) that relates Y and X by

Y(t) = m(X) + \varepsilon(t),

where \varepsilon is a random variable in L^2([c, d]) and t \in [c, d]. The check is formalized as the test of the composite hypothesis

H_0: m \in \{m_\beta : \beta \in L^2([a, b]) \otimes L^2([c, d])\},

where

m_\beta(X(s))(t) = \int_a^b \beta(s, t) X(s)\,\mathrm{d}s

is the linear, Hilbert–Schmidt, integral operator parametrized by the bivariate kernel \beta. Its estimation is done by the truncated expansion of \beta in the tensor product of the data-driven bases of Functional Principal Components (FPC) of X and Y. The FPC basis for X is truncated in p components, while the FPC basis for Y is truncated in q components.

The particular cases in which either X or Y are constant functions give either a scalar predictor or response. The simple linear model arises if both X and Y are scalar, for which \beta is a constant.

Usage

flm_test(X, Y, beta0 = NULL, B = 500, est_method = "fpcr", p = NULL,
  q = NULL, thre_p = 0.99, thre_q = 0.99, lambda = NULL,
  boot_scores = TRUE, verbose = TRUE, plot_dens = TRUE,
  plot_proc = TRUE, plot_max_procs = 100, plot_max_p = 2,
  plot_max_q = 2, save_fit_flm = TRUE, save_boot_stats = TRUE,
  int_rule = "trapezoid", refit_lambda = FALSE, ...)

Arguments

X, Y

samples of functional/scalar predictors and functional/scalar response. Either fdata objects (for functional variables) or vectors of length n (for scalar variables).

beta0

if provided (defaults to NULL), the simple null hypothesis H_0: m = m_{\beta_0} is tested. beta0 must be a matrix of size
c(length(X$argvals), length(Y$argvals)). If X or Y are scalar, beta0 can be also an fdata object, with the same argvals as X or Y. Can also be a constant (understood as a shorthand for a matrix with all its entries equal to the constant).

B

number of bootstrap replicates. Defaults to 500.

est_method

either "fpcr" (Functional Principal Components Regression; FPCR), "fpcr_l2" (FPCR with ridge penalty), "fpcr_l1" (FPCR with lasso penalty) or "fpcr_l1s" (FPCR with lasso-selected FPC). If X is scalar, flm_est only considers "fpcr" as estimation method. See details below. Defaults to "fpcr_l1s".

p, q

either index vectors indicating the specific FPC to be considered for the truncated bases expansions of X and Y, respectively. If a single number for p is provided, then p <- 1:max(p) internally (analogously for q) and the first max(p) FPC are considered. If NULL (default), then a data-driven selection of p and q is done. See details below.

thre_p, thre_q

thresholds for the proportion of variance that is explained, at least, by the first p and q FPC of X and Y, respectively. These thresholds are employed for an (initial) automatic selection of p and q. Default to 0.99. thre_p (thre_q) is ignored if p (q) is provided.

lambda

regularization parameter \lambda for the estimation methods "fpcr_l2", "fpcr_l1", and "fpcr_l1s". If NULL (default), it is chosen with cv_glmnet.

boot_scores

flag to indicate if the bootstrap shall be applied to the scores of the residuals, rather than to the functional residuals. This improves the computational expediency notably. Defaults to TRUE.

verbose

flag to show information about the testing progress. Defaults to TRUE.

plot_dens

flag to indicate if a kernel density estimation of the bootstrap statistics shall be plotted. Defaults to TRUE.

plot_proc

whether to display a graphical tool to identify the degree of departure from the null hypothesis. If TRUE (default), the residual marked empirical process, projected in several FPC directions of X and Y, is shown, together with bootstrap analogues. The FPC directions are ones selected at the estimation stage.

plot_max_procs

maximum number of bootstrapped processes to plot in the graphical tool. Set as the minimum of plot_max_procs and B. Defaults to 100.

plot_max_p, plot_max_q

maximum number of FPC directions to be considered in the graphical tool. They limit the resulting plot to be at most of size c(plot_max_p, plot_max_q). Default to 2.

save_fit_flm, save_boot_stats

flag to return fit_flm and boot_*. If FALSE, these memory-expensive objects are set to NA. Default to TRUE.

int_rule

quadrature rule for approximating the definite unidimensional integral: trapezoidal rule (int_rule = "trapezoid") and extended Simpson rule (int_rule = "Simpson") are available. Defaults to "trapezoid".

refit_lambda

flag to reselect lambda in each bootstrap replicate, incorporating its variability in the bootstrap calibration. Much more time consumig. Defaults to FALSE.

...

further parameters to be passed to cv_glmnet (and then to cv.glmnet) such as cv_1se, cv_nlambda or cv_parallel, among others.

Details

The function implements the bootstrap-based goodness-of-fit test for the functional linear model with functional/scalar response and functional/scalar predictor, as described in Algorithm 1 in García-Portugués et al. (2021). The specifics are detailed there.

By default cv_1se = TRUE for cv_glmnet is considered, unless it is changed via .... This is the recommended choice for conducting the goodness-of-fit test based on regularized estimators, as the oversmoothed estimate of the regression model under the null hypothesis notably facilitates the calibration of the test (see García-Portugués et al., 2021).

The graphical tool obtained with plot_proc = TRUE is based on an extension of the tool described in García-Portugués et al. (2014).

Repeated observations on X are internally removed, as otherwise they would cause NaNs in Adot. Missing values on X and Y are also automatically removed.

Value

An object of the htest class with the following elements:

statistic

test statistic.

p.value

p-value of the test.

boot_statistics

the bootstrapped test statistics, a vector of length B.

method

information on the type of test performed.

parameter

a vector with the dimensions p and q considered in the test statistic. These are the lengths of the outputs p and q.

p

the index of the FPC considered for X.

q

the index of the FPC considered for Y.

fit_flm

the output resulted from calling flm_est.

boot_lambda

bootstrapped lambda.

boot_p

a list with the bootstrapped indexes of the FPC considered for X.

data.name

name of the value of data.

Author(s)

Eduardo García-Portugués.

References

García-Portugués, E., Álvarez-Liébana, J., Álvarez-Pérez, G. and Gonzalez-Manteiga, W. (2021). A goodness-of-fit test for the functional linear model with functional response. Scandinavian Journal of Statistics, 48(2):502–528. doi:10.1111/sjos.12486

García-Portugués, E., González-Manteiga, W. and Febrero-Bande, M. (2014). A goodness-of-fit test for the functional linear model with scalar response. Journal of Computational and Graphical Statistics, 23(3):761–778. doi:10.1080/10618600.2013.812519

Examples

## Quick example for functional response and predictor

# Generate data under H0
n <- 100
set.seed(987654321)
X_fdata <- r_ou(n = n, t = seq(0, 1, l = 101), sigma = 2)
epsilon <- r_ou(n = n, t = seq(0, 1, l = 101), sigma = 0.5)
Y_fdata <- epsilon

# Test the FLMFR
flm_test(X = X_fdata, Y = Y_fdata)

# Simple hypothesis
flm_test(X = X_fdata, Y = Y_fdata, beta0 = 0)

# Generate data under H1
n <- 100
set.seed(987654321)
sample_frm_fr <- r_frm_fr(n = n, scenario = 3, s = seq(0, 1, l = 101),
                          t = seq(0, 1, l = 101), nonlinear = "quadratic")
X_fdata <- sample_frm_fr[["X_fdata"]]
Y_fdata <- sample_frm_fr[["Y_fdata"]]

# Test the FLMFR
flm_test(X = X_fdata, Y = Y_fdata)

## Functional response and predictor

# Generate data under H0
n <- 50
B <- 100
set.seed(987654321)
t <- seq(0, 1, l = 201)
X_fdata <- r_ou(n = n, t = t, sigma = 2)
epsilon <- r_ou(n = n, t = t, sigma = 0.5)
Y_fdata <- epsilon

# With boot_scores = TRUE
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr", B = B)
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2", B = B)
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s", B = B)

# With boot_scores = FALSE
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s",
         boot_scores = FALSE, B = B)

# Simple hypothesis
flm_test(X = X_fdata, Y = Y_fdata, beta0 = 2, est_method = "fpcr", B = B)
flm_test(X = X_fdata, Y = Y_fdata, beta0 = 0, est_method = "fpcr", B = B)
flm_test(X = X_fdata, Y = Y_fdata, beta0 = 0, est_method = "fpcr_l1s", B = B)

# Generate data under H1
n <- 50
B <- 100
set.seed(987654321)
sample_frm_fr <- r_frm_fr(n = n, scenario = 3, s = t, t = t,
                          nonlinear = "quadratic")
X_fdata <- sample_frm_fr$X_fdata
Y_fdata <- sample_frm_fr$Y_fdata

# With boot_scores = TRUE
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr", B = B)
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2", B = B)
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s", B = B)

# With boot_scores = FALSE
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s",
         boot_scores = FALSE, B = B)

## Scalar response and functional predictor

# Generate data under H0
n <- 50
B <- 100
set.seed(987654321)
t <- seq(0, 1, l = 201)
X_fdata <- r_ou(n = n, t = t, sigma = 2)
beta <- r_ou(n = 1, t = t, sigma = 0.5, x0 = 2)
epsilon <- rnorm(n = n)
Y <- drop(inprod_fdata(X_fdata1 = X_fdata, X_fdata2 = beta) + epsilon)

# With boot_scores = TRUE
flm_test(X = X_fdata, Y = Y, est_method = "fpcr", B = B)
flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l2", B = B)
flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1s", B = B)

# With boot_scores = FALSE
flm_test(X = X_fdata, Y = Y, est_method = "fpcr",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l2",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1s",
         boot_scores = FALSE, B = B)

# Simple hypothesis
flm_test(X = X_fdata, Y = Y, beta0 = beta, est_method = "fpcr", B = B)
flm_test(X = X_fdata, Y = Y, beta0 = 0, est_method = "fpcr", B = B)
flm_test(X = X_fdata, Y = Y, beta0 = 0, est_method = "fpcr_l1s", B = B)

# Generate data under H1
n <- 50
B <- 100
set.seed(987654321)
X_fdata <- r_ou(n = n, t = t, sigma = 2)
beta <- r_ou(n = 1, t = t, sigma = 0.5)
epsilon <- rnorm(n = n)
Y <- drop(exp(inprod_fdata(X_fdata1 = X_fdata^2, X_fdata2 = beta)) + epsilon)

# With boot_scores = TRUE
flm_test(X = X_fdata, Y = Y, est_method = "fpcr", B = B)
flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l2", B = B)
flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1s", B = B)

# With boot_scores = FALSE
flm_test(X = X_fdata, Y = Y, est_method = "fpcr",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l2",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1s",
         boot_scores = FALSE, B = B)

## Functional response and scalar predictor

# Generate data under H0
n <- 50
B <- 100
set.seed(987654321)
X <- rnorm(n)
t <- seq(0, 1, l = 201)
beta <- r_ou(n = 1, t = t, sigma = 0.5, x0 = 3)
beta$data <- matrix(beta$data, nrow = n, ncol = ncol(beta$data),
                    byrow = TRUE)
epsilon <- r_ou(n = n, t = t, sigma = 0.5)
Y_fdata <- X * beta + epsilon

# With boot_scores = TRUE
flm_test(X = X, Y = Y_fdata, est_method = "fpcr", B = B)

# With boot_scores = FALSE
flm_test(X = X, Y = Y_fdata, est_method = "fpcr", boot_scores = FALSE, B = B)

# Simple hypothesis
flm_test(X = X, Y = Y_fdata, beta0 = beta[1], est_method = "fpcr", B = B)
flm_test(X = X, Y = Y_fdata, beta0 = 0, est_method = "fpcr", B = B)

# Generate data under H1
n <- 50
B <- 100
set.seed(987654321)
X <- rexp(n)
beta <- r_ou(n = 1, t = t, sigma = 0.5, x0 = 3)
beta$data <- matrix(beta$data, nrow = n, ncol = ncol(beta$data),
                    byrow = TRUE)
epsilon <- r_ou(n = n, t = t, sigma = 0.5)
Y_fdata <- log(X * beta) + epsilon

# With boot_scores = TRUE
flm_test(X = X, Y = Y_fdata, est_method = "fpcr", B = B)

# With boot_scores = FALSE
flm_test(X = X, Y = Y_fdata, est_method = "fpcr", boot_scores = FALSE, B = B)


[Package goffda version 0.1.2 Index]