R: Estimation of functional linear models

flm_est {goffda}

R Documentation

Estimation of functional linear models

Description

Estimation of the linear operator relating a functional predictor X with a functional response Y in the linear model

Y(t) = \int_a^b \beta(s, t) X(s)\,\mathrm{d}s + \varepsilon(t),

where X is a random variable in the Hilbert space of square-integrable functions in [a, b], L^2([a, b]), Y and \varepsilon are random variables in L^2([c, d]), and s \in [a, b] and t \in [c, d].

The linear, Hilbert–Schmidt, integral operator is parametrized by the bivariate kernel \beta \in L^2([a, b]) \otimes L^2([c, d]). Its estimation is done through the truncated expansion of \beta in the tensor product of the data-driven bases of the Functional Principal Components (FPC) of X and Y, and through the fitting of the resulting multivariate linear model. The FPC basis for X is truncated in p components, while the FPC basis for Y is truncated in q components. Automatic selection of p and q is detailed below.

The particular cases in which either X or Y are constant functions give either a scalar predictor or response. The simple linear model arises if both X and Y are scalar, for which \beta is a constant.

Usage

flm_est(X, Y, est_method = "fpcr_l1s", p = NULL, q = NULL,
  thre_p = 0.99, thre_q = 0.99, lambda = NULL, X_fpc = NULL,
  Y_fpc = NULL, compute_residuals = TRUE, centered = FALSE,
  int_rule = "trapezoid", cv_verbose = FALSE, ...)

Arguments

`X`, `Y`	samples of functional/scalar predictors and functional/scalar response. Either `fdata` objects (for functional variables) or vectors of length `n` (for scalar variables).
`est_method`	either `"fpcr"` (Functional Principal Components Regression; FPCR), `"fpcr_l2"` (FPCR with ridge penalty), `"fpcr_l1"` (FPCR with lasso penalty) or `"fpcr_l1s"` (FPCR with lasso-selected FPC). If `X` is scalar, `flm_est` only considers `"fpcr"` as estimation method. See details below. Defaults to `"fpcr_l1s"`.
`p`, `q`	index vectors indicating the specific FPC to be considered for the truncated bases expansions of `X` and `Y`, respectively. If a single number for `p` is provided, then `p <- 1:max(p)` internally (analogously for `q`) and the first `max(p)` FPC are considered. If `NULL` (default), then a data-driven selection of `p` and `q` is done. See details below.
`thre_p`, `thre_q`	thresholds for the proportion of variance that is explained, at least, by the first `p` and `q` FPC of `X` and `Y`, respectively. These thresholds are employed for an (initial) automatic selection of `p` and `q`. Default to `0.99`. `thre_p` (`thre_q`) is ignored if `p` (`q`) is provided.
`lambda`	regularization parameter `\lambda` for the estimation methods `"fpcr_l2"`, `"fpcr_l1"`, and `"fpcr_l1s"`. If `NULL` (default), it is chosen with `cv_glmnet`.
`X_fpc`, `Y_fpc`	FPC decompositions of `X` and `Y`, as returned by `fpc`. Computed if not provided.
`compute_residuals`	whether to compute the fitted values `Y_hat` and its `Y_hat_scores`, and the `residuals` of the fit and its `residuals_scores`. Defaults to `TRUE`.
`centered`	flag to indicate if `X` and `Y` have been centered or not, in order to avoid their recentering. Defaults to `FALSE`.
`int_rule`	quadrature rule for approximating the definite unidimensional integral: trapezoidal rule (`int_rule = "trapezoid"`) and extended Simpson rule (`int_rule = "Simpson"`) are available. Defaults to `"trapezoid"`.
`cv_verbose`	flag to display information about the estimation procedure (passed to `cv_glmnet`). Defaults to `FALSE`.
`...`	further parameters to be passed to `cv_glmnet` (and then to `cv.glmnet`) such as `cv_1se`, `cv_nlambda` or `cv_parallel`, among others.

Details

flm_est deals seamlessly with either functional or scalar inputs for the predictor and response. In the case of scalar inputs, the corresponding dimension-related arguments (p, q, thre_p or thre_q) will be ignored as in these cases either p = 1 or q = 1.

The function translates the functional linear model into a multivariate model with multivariate response and then estimates the p \times q matrix of coefficients of \beta in the tensor basis of the FPC of X and Y. The following estimation methods are implemented:

"fpcr": Functional Principal Components Regression (FPCR); see details in Ramsay and Silverman (2005).
"fpcr_l2": FPCR, with ridge penalty on the associated multivariate linear model.
"fpcr_l1": FPCR, with lasso penalty on the associated multivariate linear model.
"fpcr_l1s": FPCR, with FPC selected by lasso regression on the associated multivariate linear model.

The last three methods are explained in García-Portugués et al. (2021).

The p FPC of X and q FPC of Y are determined as follows:

If p = NULL, then p is set as p_thre <- 1:j_thre, where j_thre is the j-th FPC of X for which the cumulated proportion of explained variance is greater than thre_p. If p != NULL, then p_thre <- p.
If q = NULL, then the same procedure is followed with thre_q, resulting q_thre.

Once p_thre and q_thre have been obtained, the methods "fpcr_l1" and "fpcr_l1s" perform a second selection of the FPC that are effectively considered in the estimation of \beta. This subset of FPC (of p_thre) is encoded in p_hat. No further selection of FPC is done for the methods "fpcr" and "fpcr_l2".

The flag compute_residuals controls if Y_hat, Y_hat_scores, residuals, and residuals_scores are computed. If FALSE, they are set to NULL. Y_hat equals \hat Y_i(t) = \int_a^b \hat \beta(s, t) X_i(s) \,\mathrm{d}s and residuals stands for \hat \varepsilon_i(t) = Y_i(t) - \hat Y_i(t), both for i = 1, \ldots, n. Y_hat_scores and
residuals_scores are the n\times q matrices of coefficients (or scores) of these functions in the FPC of Y.

Missing values on X and Y are automatically removed.

Value

A list with the following entries:

`Beta_hat`	estimated `\beta`, a matrix with values `\hat\beta(s, t)` evaluated at the grid points for `X` and `Y`. Its size is `c(length(X$argvals), length(Y$argvals))`.
`Beta_hat_scores`	the matrix of coefficients of `Beta_hat` (resulting from projecting it into the tensor basis of `X_fpc` and `Y_fpc`), with dimension `c(p_hat, q_thre)`.
`H_hat`	hat matrix of the associated fitted multivariate linear model, a matrix of size `c(n, n)`. `NULL` if `est_method = "fpcr_l1"`, since lasso estimation does not provide it explicitly.
`p_thre`	index vector indicating the FPC of `X` considered for estimating the model. Chosen by `thre_p` or equal to `p` if given.
`p_hat`	index vector of the FPC considered by the methods `"fpcr_l1"` and `"fpcr_l1s"` methods after further selection of the FPC considered in `p_thre`. For methods `"fpcr"` and `"fpcr_l2"`, `p_hat` equals `p_thre`.
`q_thre`	index vector indicating the FPC of `Y` considered for estimating the model. Chosen by `thre_q` or equal to `q` if given. Note that zeroing by lasso procedure only affects in the rows.
`est_method`	the estimation method employed.
`Y_hat`	fitted values, either an `fdata` object or a vector, depending on `Y`.
`Y_hat_scores`	the matrix of coefficients of `Y_hat`, with dimension `c(n, q_thre)`.
`residuals`	residuals of the fitted model, either an `fdata` object or a vector, depending on `Y`.
`residuals_scores`	the matrix of coefficients of `residuals`, with dimension `c(n, q_thre)`.
`X_fpc`, `Y_fpc`	FPC of `X` and `Y`, as returned by `fpc` with `n_fpc = n`.
`lambda`	regularization parameter `\lambda` used for the estimation methods `"fpcr_l2"`, `"fpcr_l1"`, and `"fpcr_l1s"`.
`cv`	cross-validation object returned by `cv_glmnet`.

Author(s)

Eduardo García-Portugués and Javier Álvarez-Liébana.

References

García-Portugués, E., Álvarez-Liébana, J., Álvarez-Pérez, G. and Gonzalez-Manteiga, W. (2021). A goodness-of-fit test for the functional linear model with functional response. Scandinavian Journal of Statistics, 48(2):502–528. doi:10.1111/sjos.12486

Ramsay, J. and Silverman, B. W. (2005). Functional Data Analysis. Springer-Verlag, New York.

Examples

## Quick example of functional response and functional predictor

# Generate data
set.seed(12345)
n <- 50
X_fdata <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 2)
epsilon <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 0.5)
Y_fdata <- 2 * X_fdata + epsilon

# Lasso-selection FPCR (p and q are estimated)
flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s")

## Functional response and functional predictor

# Generate data
set.seed(12345)
n <- 50
X_fdata <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 2)
epsilon <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 0.5)
Y_fdata <- 2 * X_fdata + epsilon

# FPCR (p and q are estimated)
fpcr_1 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr")
fpcr_1$Beta_hat_scores
fpcr_1$p_thre
fpcr_1$q_thre

# FPCR (p and q are provided)
fpcr_2 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr",
                  p = c(1, 5, 2, 7), q = 2:1)
fpcr_2$Beta_hat_scores
fpcr_2$p_thre
fpcr_2$q_thre

# Ridge FPCR (p and q are estimated)
l2_1 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2")
l2_1$Beta_hat_scores
l2_1$p_hat

# Ridge FPCR (p and q are provided)
l2_2 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2",
                p = c(1, 5, 2, 7), q = 2:1)
l2_2$Beta_hat_scores
l2_2$p_hat

# Lasso FPCR (p and q are estimated)
l1_1 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1")
l1_1$Beta_hat_scores
l1_1$p_thre
l1_1$p_hat

# Lasso estimator (p and q are provided)
l1_2 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1",
                p = c(1, 5, 2, 7), q = 2:1)
l1_2$Beta_hat_scores
l1_2$p_thre
l1_2$p_hat

# Lasso-selection FPCR (p and q are estimated)
l1s_1 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s")
l1s_1$Beta_hat_scores
l1s_1$p_thre
l1s_1$p_hat

# Lasso-selection FPCR (p and q are provided)
l1s_2 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s",
                 p = c(1, 5, 2, 7), q = 1:4)
l1s_2$Beta_hat_scores
l1s_2$p_thre
l1s_2$p_hat

## Scalar response

# Generate data
set.seed(12345)
n <- 50
beta <- r_ou(n = 1, t = seq(0, 1, l = 201), sigma = 0.5, x0 = 3)
X_fdata <- fdata_cen(r_ou(n = n, t = seq(0, 1, l = 201), sigma = 2))
epsilon <- rnorm(n, sd = 0.25)
Y <- drop(inprod_fdata(X_fdata1 = X_fdata, X_fdata2 = beta)) + epsilon

# FPCR
fpcr_4 <- flm_est(X = X_fdata, Y = Y, est_method = "fpcr")
fpcr_4$p_hat

# Ridge FPCR
l2_4 <- flm_est(X = X_fdata, Y = Y, est_method = "fpcr_l2")
l2_4$p_hat

# Lasso FPCR
l1_4 <- flm_est(X = X_fdata, Y = Y, est_method = "fpcr_l1")
l1_4$p_hat

# Lasso-selection FPCR
l1s_4 <- flm_est(X = X_fdata, Y = Y, est_method = "fpcr_l1s")
l1s_4$p_hat

## Scalar predictor

# Generate data
set.seed(12345)
n <- 50
X <- rnorm(n)
epsilon <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 0.5)
beta <- r_ou(n = 1, t = seq(0, 1, l = 201), sigma = 0.5, x0 = 3)
beta$data <- matrix(beta$data, nrow = n, ncol = ncol(beta$data),
                    byrow = TRUE)
Y_fdata <- beta * X + epsilon

# FPCR
fpcr_4 <- flm_est(X = X, Y = Y_fdata, est_method = "fpcr")
plot(beta, col = 2)
lines(beta$argvals, drop(fpcr_4$Beta_hat))

[Package goffda version 0.1.2 Index]