R: Goodness-of-fit test for functional linear models

flm_test {goffda}

R Documentation

Goodness-of-fit test for functional linear models

Description

Goodness-of-fit test of a functional linear model with functional response Y \in L^2([c, d]) and functional predictor X \in L^2([a, b]), where L^2([a, b]) is the Hilbert space of square-integrable functions in [a, b].

The goodness-of-fit test checks the linearity of the regression model m:L^2([a, b])\rightarrow L^2([c, d]) that relates Y and X by

Y(t) = m(X) + \varepsilon(t),

where \varepsilon is a random variable in L^2([c, d]) and t \in [c, d]. The check is formalized as the test of the composite hypothesis

H_0: m \in \{m_\beta : \beta \in L^2([a, b]) \otimes L^2([c, d])\},

where

m_\beta(X(s))(t) = \int_a^b \beta(s, t) X(s)\,\mathrm{d}s

is the linear, Hilbert–Schmidt, integral operator parametrized by the bivariate kernel \beta. Its estimation is done by the truncated expansion of \beta in the tensor product of the data-driven bases of Functional Principal Components (FPC) of X and Y. The FPC basis for X is truncated in p components, while the FPC basis for Y is truncated in q components.

The particular cases in which either X or Y are constant functions give either a scalar predictor or response. The simple linear model arises if both X and Y are scalar, for which \beta is a constant.

Usage

flm_test(X, Y, beta0 = NULL, B = 500, est_method = "fpcr", p = NULL,
  q = NULL, thre_p = 0.99, thre_q = 0.99, lambda = NULL,
  boot_scores = TRUE, verbose = TRUE, plot_dens = TRUE,
  plot_proc = TRUE, plot_max_procs = 100, plot_max_p = 2,
  plot_max_q = 2, save_fit_flm = TRUE, save_boot_stats = TRUE,
  int_rule = "trapezoid", refit_lambda = FALSE, ...)

Arguments

`X`, `Y`	samples of functional/scalar predictors and functional/scalar response. Either `fdata` objects (for functional variables) or vectors of length `n` (for scalar variables).
`beta0`	if provided (defaults to `NULL`), the simple null hypothesis `H_0: m = m_{\beta_0}` is tested. `beta0` must be a matrix of size `c(length(X$argvals), length(Y$argvals))`. If `X` or `Y` are scalar, `beta0` can be also an `fdata` object, with the same `argvals` as `X` or `Y`. Can also be a constant (understood as a shorthand for a matrix with all its entries equal to the constant).
`B`	number of bootstrap replicates. Defaults to `500`.
`est_method`	either `"fpcr"` (Functional Principal Components Regression; FPCR), `"fpcr_l2"` (FPCR with ridge penalty), `"fpcr_l1"` (FPCR with lasso penalty) or `"fpcr_l1s"` (FPCR with lasso-selected FPC). If `X` is scalar, `flm_est` only considers `"fpcr"` as estimation method. See details below. Defaults to `"fpcr_l1s"`.
`p`, `q`	either index vectors indicating the specific FPC to be considered for the truncated bases expansions of `X` and `Y`, respectively. If a single number for `p` is provided, then `p <- 1:max(p)` internally (analogously for `q`) and the first `max(p)` FPC are considered. If `NULL` (default), then a data-driven selection of `p` and `q` is done. See details below.
`thre_p`, `thre_q`	thresholds for the proportion of variance that is explained, at least, by the first `p` and `q` FPC of `X` and `Y`, respectively. These thresholds are employed for an (initial) automatic selection of `p` and `q`. Default to `0.99`. `thre_p` (`thre_q`) is ignored if `p` (`q`) is provided.
`lambda`	regularization parameter `\lambda` for the estimation methods `"fpcr_l2"`, `"fpcr_l1"`, and `"fpcr_l1s"`. If `NULL` (default), it is chosen with `cv_glmnet`.
`boot_scores`	flag to indicate if the bootstrap shall be applied to the scores of the residuals, rather than to the functional residuals. This improves the computational expediency notably. Defaults to `TRUE`.
`verbose`	flag to show information about the testing progress. Defaults to `TRUE`.
`plot_dens`	flag to indicate if a kernel density estimation of the bootstrap statistics shall be plotted. Defaults to `TRUE`.
`plot_proc`	whether to display a graphical tool to identify the degree of departure from the null hypothesis. If `TRUE` (default), the residual marked empirical process, projected in several FPC directions of `X` and `Y`, is shown, together with bootstrap analogues. The FPC directions are ones selected at the estimation stage.
`plot_max_procs`	maximum number of bootstrapped processes to plot in the graphical tool. Set as the minimum of `plot_max_procs` and `B`. Defaults to `100`.
`plot_max_p`, `plot_max_q`	maximum number of FPC directions to be considered in the graphical tool. They limit the resulting plot to be at most of size `c(plot_max_p, plot_max_q)`. Default to `2`.
`save_fit_flm`, `save_boot_stats`	flag to return `fit_flm` and `boot_*`. If `FALSE`, these memory-expensive objects are set to `NA`. Default to `TRUE`.
`int_rule`	quadrature rule for approximating the definite unidimensional integral: trapezoidal rule (`int_rule = "trapezoid"`) and extended Simpson rule (`int_rule = "Simpson"`) are available. Defaults to `"trapezoid"`.
`refit_lambda`	flag to reselect `lambda` in each bootstrap replicate, incorporating its variability in the bootstrap calibration. Much more time consumig. Defaults to `FALSE`.
`...`	further parameters to be passed to `cv_glmnet` (and then to `cv.glmnet`) such as `cv_1se`, `cv_nlambda` or `cv_parallel`, among others.

Details

The function implements the bootstrap-based goodness-of-fit test for the functional linear model with functional/scalar response and functional/scalar predictor, as described in Algorithm 1 in García-Portugués et al. (2021). The specifics are detailed there.

By default cv_1se = TRUE for cv_glmnet is considered, unless it is changed via .... This is the recommended choice for conducting the goodness-of-fit test based on regularized estimators, as the oversmoothed estimate of the regression model under the null hypothesis notably facilitates the calibration of the test (see García-Portugués et al., 2021).

The graphical tool obtained with plot_proc = TRUE is based on an extension of the tool described in García-Portugués et al. (2014).

Repeated observations on X are internally removed, as otherwise they would cause NaNs in Adot. Missing values on X and Y are also automatically removed.

Value

An object of the htest class with the following elements:

`statistic`	test statistic.
`p.value`	`p`-value of the test.
`boot_statistics`	the bootstrapped test statistics, a vector of length `B`.
`method`	information on the type of test performed.
`parameter`	a vector with the dimensions `p` and `q` considered in the test statistic. These are the lengths of the outputs `p` and `q`.
`p`	the index of the FPC considered for `X`.
`q`	the index of the FPC considered for `Y`.
`fit_flm`	the output resulted from calling `flm_est`.
`boot_lambda`	bootstrapped `lambda`.
`boot_p`	a list with the bootstrapped indexes of the FPC considered for `X`.
`data.name`	name of the value of `data`.

Author(s)

Eduardo García-Portugués.

References

García-Portugués, E., Álvarez-Liébana, J., Álvarez-Pérez, G. and Gonzalez-Manteiga, W. (2021). A goodness-of-fit test for the functional linear model with functional response. Scandinavian Journal of Statistics, 48(2):502–528. doi:10.1111/sjos.12486

García-Portugués, E., González-Manteiga, W. and Febrero-Bande, M. (2014). A goodness-of-fit test for the functional linear model with scalar response. Journal of Computational and Graphical Statistics, 23(3):761–778. doi:10.1080/10618600.2013.812519

Examples

## Quick example for functional response and predictor

# Generate data under H0
n <- 100
set.seed(987654321)
X_fdata <- r_ou(n = n, t = seq(0, 1, l = 101), sigma = 2)
epsilon <- r_ou(n = n, t = seq(0, 1, l = 101), sigma = 0.5)
Y_fdata <- epsilon

# Test the FLMFR
flm_test(X = X_fdata, Y = Y_fdata)

# Simple hypothesis
flm_test(X = X_fdata, Y = Y_fdata, beta0 = 0)

# Generate data under H1
n <- 100
set.seed(987654321)
sample_frm_fr <- r_frm_fr(n = n, scenario = 3, s = seq(0, 1, l = 101),
                          t = seq(0, 1, l = 101), nonlinear = "quadratic")
X_fdata <- sample_frm_fr[["X_fdata"]]
Y_fdata <- sample_frm_fr[["Y_fdata"]]

# Test the FLMFR
flm_test(X = X_fdata, Y = Y_fdata)

## Functional response and predictor

# Generate data under H0
n <- 50
B <- 100
set.seed(987654321)
t <- seq(0, 1, l = 201)
X_fdata <- r_ou(n = n, t = t, sigma = 2)
epsilon <- r_ou(n = n, t = t, sigma = 0.5)
Y_fdata <- epsilon

# With boot_scores = TRUE
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr", B = B)
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2", B = B)
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s", B = B)

# With boot_scores = FALSE
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s",
         boot_scores = FALSE, B = B)

# Simple hypothesis
flm_test(X = X_fdata, Y = Y_fdata, beta0 = 2, est_method = "fpcr", B = B)
flm_test(X = X_fdata, Y = Y_fdata, beta0 = 0, est_method = "fpcr", B = B)
flm_test(X = X_fdata, Y = Y_fdata, beta0 = 0, est_method = "fpcr_l1s", B = B)

# Generate data under H1
n <- 50
B <- 100
set.seed(987654321)
sample_frm_fr <- r_frm_fr(n = n, scenario = 3, s = t, t = t,
                          nonlinear = "quadratic")
X_fdata <- sample_frm_fr$X_fdata
Y_fdata <- sample_frm_fr$Y_fdata

# With boot_scores = TRUE
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr", B = B)
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2", B = B)
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s", B = B)

# With boot_scores = FALSE
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s",
         boot_scores = FALSE, B = B)

## Scalar response and functional predictor

# Generate data under H0
n <- 50
B <- 100
set.seed(987654321)
t <- seq(0, 1, l = 201)
X_fdata <- r_ou(n = n, t = t, sigma = 2)
beta <- r_ou(n = 1, t = t, sigma = 0.5, x0 = 2)
epsilon <- rnorm(n = n)
Y <- drop(inprod_fdata(X_fdata1 = X_fdata, X_fdata2 = beta) + epsilon)

# With boot_scores = TRUE
flm_test(X = X_fdata, Y = Y, est_method = "fpcr", B = B)
flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l2", B = B)
flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1s", B = B)

# With boot_scores = FALSE
flm_test(X = X_fdata, Y = Y, est_method = "fpcr",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l2",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1s",
         boot_scores = FALSE, B = B)

# Simple hypothesis
flm_test(X = X_fdata, Y = Y, beta0 = beta, est_method = "fpcr", B = B)
flm_test(X = X_fdata, Y = Y, beta0 = 0, est_method = "fpcr", B = B)
flm_test(X = X_fdata, Y = Y, beta0 = 0, est_method = "fpcr_l1s", B = B)

# Generate data under H1
n <- 50
B <- 100
set.seed(987654321)
X_fdata <- r_ou(n = n, t = t, sigma = 2)
beta <- r_ou(n = 1, t = t, sigma = 0.5)
epsilon <- rnorm(n = n)
Y <- drop(exp(inprod_fdata(X_fdata1 = X_fdata^2, X_fdata2 = beta)) + epsilon)

# With boot_scores = TRUE
flm_test(X = X_fdata, Y = Y, est_method = "fpcr", B = B)
flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l2", B = B)
flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1s", B = B)

# With boot_scores = FALSE
flm_test(X = X_fdata, Y = Y, est_method = "fpcr",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l2",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1",
         boot_scores = FALSE, B = B)
flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1s",
         boot_scores = FALSE, B = B)

## Functional response and scalar predictor

# Generate data under H0
n <- 50
B <- 100
set.seed(987654321)
X <- rnorm(n)
t <- seq(0, 1, l = 201)
beta <- r_ou(n = 1, t = t, sigma = 0.5, x0 = 3)
beta$data <- matrix(beta$data, nrow = n, ncol = ncol(beta$data),
                    byrow = TRUE)
epsilon <- r_ou(n = n, t = t, sigma = 0.5)
Y_fdata <- X * beta + epsilon

# With boot_scores = TRUE
flm_test(X = X, Y = Y_fdata, est_method = "fpcr", B = B)

# With boot_scores = FALSE
flm_test(X = X, Y = Y_fdata, est_method = "fpcr", boot_scores = FALSE, B = B)

# Simple hypothesis
flm_test(X = X, Y = Y_fdata, beta0 = beta[1], est_method = "fpcr", B = B)
flm_test(X = X, Y = Y_fdata, beta0 = 0, est_method = "fpcr", B = B)

# Generate data under H1
n <- 50
B <- 100
set.seed(987654321)
X <- rexp(n)
beta <- r_ou(n = 1, t = t, sigma = 0.5, x0 = 3)
beta$data <- matrix(beta$data, nrow = n, ncol = ncol(beta$data),
                    byrow = TRUE)
epsilon <- r_ou(n = n, t = t, sigma = 0.5)
Y_fdata <- log(X * beta) + epsilon

# With boot_scores = TRUE
flm_test(X = X, Y = Y_fdata, est_method = "fpcr", B = B)

# With boot_scores = FALSE
flm_test(X = X, Y = Y_fdata, est_method = "fpcr", boot_scores = FALSE, B = B)

[Package goffda version 0.1.2 Index]