flm_est {goffda}R Documentation

Estimation of functional linear models

Description

Estimation of the linear operator relating a functional predictor X with a functional response Y in the linear model

Y(t) = \int_a^b \beta(s, t) X(s)\,\mathrm{d}s + \varepsilon(t),

where X is a random variable in the Hilbert space of square-integrable functions in [a, b], L^2([a, b]), Y and \varepsilon are random variables in L^2([c, d]), and s \in [a, b] and t \in [c, d].

The linear, Hilbert–Schmidt, integral operator is parametrized by the bivariate kernel \beta \in L^2([a, b]) \otimes L^2([c, d]). Its estimation is done through the truncated expansion of \beta in the tensor product of the data-driven bases of the Functional Principal Components (FPC) of X and Y, and through the fitting of the resulting multivariate linear model. The FPC basis for X is truncated in p components, while the FPC basis for Y is truncated in q components. Automatic selection of p and q is detailed below.

The particular cases in which either X or Y are constant functions give either a scalar predictor or response. The simple linear model arises if both X and Y are scalar, for which \beta is a constant.

Usage

flm_est(X, Y, est_method = "fpcr_l1s", p = NULL, q = NULL,
  thre_p = 0.99, thre_q = 0.99, lambda = NULL, X_fpc = NULL,
  Y_fpc = NULL, compute_residuals = TRUE, centered = FALSE,
  int_rule = "trapezoid", cv_verbose = FALSE, ...)

Arguments

X, Y

samples of functional/scalar predictors and functional/scalar response. Either fdata objects (for functional variables) or vectors of length n (for scalar variables).

est_method

either "fpcr" (Functional Principal Components Regression; FPCR), "fpcr_l2" (FPCR with ridge penalty), "fpcr_l1" (FPCR with lasso penalty) or "fpcr_l1s" (FPCR with lasso-selected FPC). If X is scalar, flm_est only considers "fpcr" as estimation method. See details below. Defaults to "fpcr_l1s".

p, q

index vectors indicating the specific FPC to be considered for the truncated bases expansions of X and Y, respectively. If a single number for p is provided, then p <- 1:max(p) internally (analogously for q) and the first max(p) FPC are considered. If NULL (default), then a data-driven selection of p and q is done. See details below.

thre_p, thre_q

thresholds for the proportion of variance that is explained, at least, by the first p and q FPC of X and Y, respectively. These thresholds are employed for an (initial) automatic selection of p and q. Default to 0.99. thre_p (thre_q) is ignored if p (q) is provided.

lambda

regularization parameter \lambda for the estimation methods "fpcr_l2", "fpcr_l1", and "fpcr_l1s". If NULL (default), it is chosen with cv_glmnet.

X_fpc, Y_fpc

FPC decompositions of X and Y, as returned by fpc. Computed if not provided.

compute_residuals

whether to compute the fitted values Y_hat and its Y_hat_scores, and the residuals of the fit and its residuals_scores. Defaults to TRUE.

centered

flag to indicate if X and Y have been centered or not, in order to avoid their recentering. Defaults to FALSE.

int_rule

quadrature rule for approximating the definite unidimensional integral: trapezoidal rule (int_rule = "trapezoid") and extended Simpson rule (int_rule = "Simpson") are available. Defaults to "trapezoid".

cv_verbose

flag to display information about the estimation procedure (passed to cv_glmnet). Defaults to FALSE.

...

further parameters to be passed to cv_glmnet (and then to cv.glmnet) such as cv_1se, cv_nlambda or cv_parallel, among others.

Details

flm_est deals seamlessly with either functional or scalar inputs for the predictor and response. In the case of scalar inputs, the corresponding dimension-related arguments (p, q, thre_p or thre_q) will be ignored as in these cases either p = 1 or q = 1.

The function translates the functional linear model into a multivariate model with multivariate response and then estimates the p \times q matrix of coefficients of \beta in the tensor basis of the FPC of X and Y. The following estimation methods are implemented:

The last three methods are explained in García-Portugués et al. (2021).

The p FPC of X and q FPC of Y are determined as follows:

Once p_thre and q_thre have been obtained, the methods "fpcr_l1" and "fpcr_l1s" perform a second selection of the FPC that are effectively considered in the estimation of \beta. This subset of FPC (of p_thre) is encoded in p_hat. No further selection of FPC is done for the methods "fpcr" and "fpcr_l2".

The flag compute_residuals controls if Y_hat, Y_hat_scores, residuals, and residuals_scores are computed. If FALSE, they are set to NULL. Y_hat equals \hat Y_i(t) = \int_a^b \hat \beta(s, t) X_i(s) \,\mathrm{d}s and residuals stands for \hat \varepsilon_i(t) = Y_i(t) - \hat Y_i(t), both for i = 1, \ldots, n. Y_hat_scores and
residuals_scores are the n\times q matrices of coefficients (or scores) of these functions in the FPC of Y.

Missing values on X and Y are automatically removed.

Value

A list with the following entries:

Beta_hat

estimated \beta, a matrix with values \hat\beta(s, t) evaluated at the grid points for X and Y. Its size is c(length(X$argvals), length(Y$argvals)).

Beta_hat_scores

the matrix of coefficients of Beta_hat (resulting from projecting it into the tensor basis of X_fpc and Y_fpc), with dimension c(p_hat, q_thre).

H_hat

hat matrix of the associated fitted multivariate linear model, a matrix of size c(n, n). NULL if est_method = "fpcr_l1", since lasso estimation does not provide it explicitly.

p_thre

index vector indicating the FPC of X considered for estimating the model. Chosen by thre_p or equal to p if given.

p_hat

index vector of the FPC considered by the methods "fpcr_l1" and "fpcr_l1s" methods after further selection of the FPC considered in p_thre. For methods "fpcr" and "fpcr_l2", p_hat equals p_thre.

q_thre

index vector indicating the FPC of Y considered for estimating the model. Chosen by thre_q or equal to q if given. Note that zeroing by lasso procedure only affects in the rows.

est_method

the estimation method employed.

Y_hat

fitted values, either an fdata object or a vector, depending on Y.

Y_hat_scores

the matrix of coefficients of Y_hat, with dimension c(n, q_thre).

residuals

residuals of the fitted model, either an fdata object or a vector, depending on Y.

residuals_scores

the matrix of coefficients of residuals, with dimension c(n, q_thre).

X_fpc, Y_fpc

FPC of X and Y, as returned by fpc with n_fpc = n.

lambda

regularization parameter \lambda used for the estimation methods "fpcr_l2", "fpcr_l1", and "fpcr_l1s".

cv

cross-validation object returned by cv_glmnet.

Author(s)

Eduardo García-Portugués and Javier Álvarez-Liébana.

References

García-Portugués, E., Álvarez-Liébana, J., Álvarez-Pérez, G. and Gonzalez-Manteiga, W. (2021). A goodness-of-fit test for the functional linear model with functional response. Scandinavian Journal of Statistics, 48(2):502–528. doi:10.1111/sjos.12486

Ramsay, J. and Silverman, B. W. (2005). Functional Data Analysis. Springer-Verlag, New York.

Examples

## Quick example of functional response and functional predictor

# Generate data
set.seed(12345)
n <- 50
X_fdata <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 2)
epsilon <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 0.5)
Y_fdata <- 2 * X_fdata + epsilon

# Lasso-selection FPCR (p and q are estimated)
flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s")

## Functional response and functional predictor

# Generate data
set.seed(12345)
n <- 50
X_fdata <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 2)
epsilon <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 0.5)
Y_fdata <- 2 * X_fdata + epsilon

# FPCR (p and q are estimated)
fpcr_1 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr")
fpcr_1$Beta_hat_scores
fpcr_1$p_thre
fpcr_1$q_thre

# FPCR (p and q are provided)
fpcr_2 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr",
                  p = c(1, 5, 2, 7), q = 2:1)
fpcr_2$Beta_hat_scores
fpcr_2$p_thre
fpcr_2$q_thre

# Ridge FPCR (p and q are estimated)
l2_1 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2")
l2_1$Beta_hat_scores
l2_1$p_hat

# Ridge FPCR (p and q are provided)
l2_2 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2",
                p = c(1, 5, 2, 7), q = 2:1)
l2_2$Beta_hat_scores
l2_2$p_hat

# Lasso FPCR (p and q are estimated)
l1_1 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1")
l1_1$Beta_hat_scores
l1_1$p_thre
l1_1$p_hat

# Lasso estimator (p and q are provided)
l1_2 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1",
                p = c(1, 5, 2, 7), q = 2:1)
l1_2$Beta_hat_scores
l1_2$p_thre
l1_2$p_hat

# Lasso-selection FPCR (p and q are estimated)
l1s_1 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s")
l1s_1$Beta_hat_scores
l1s_1$p_thre
l1s_1$p_hat

# Lasso-selection FPCR (p and q are provided)
l1s_2 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s",
                 p = c(1, 5, 2, 7), q = 1:4)
l1s_2$Beta_hat_scores
l1s_2$p_thre
l1s_2$p_hat

## Scalar response

# Generate data
set.seed(12345)
n <- 50
beta <- r_ou(n = 1, t = seq(0, 1, l = 201), sigma = 0.5, x0 = 3)
X_fdata <- fdata_cen(r_ou(n = n, t = seq(0, 1, l = 201), sigma = 2))
epsilon <- rnorm(n, sd = 0.25)
Y <- drop(inprod_fdata(X_fdata1 = X_fdata, X_fdata2 = beta)) + epsilon

# FPCR
fpcr_4 <- flm_est(X = X_fdata, Y = Y, est_method = "fpcr")
fpcr_4$p_hat

# Ridge FPCR
l2_4 <- flm_est(X = X_fdata, Y = Y, est_method = "fpcr_l2")
l2_4$p_hat

# Lasso FPCR
l1_4 <- flm_est(X = X_fdata, Y = Y, est_method = "fpcr_l1")
l1_4$p_hat

# Lasso-selection FPCR
l1s_4 <- flm_est(X = X_fdata, Y = Y, est_method = "fpcr_l1s")
l1s_4$p_hat

## Scalar predictor

# Generate data
set.seed(12345)
n <- 50
X <- rnorm(n)
epsilon <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 0.5)
beta <- r_ou(n = 1, t = seq(0, 1, l = 201), sigma = 0.5, x0 = 3)
beta$data <- matrix(beta$data, nrow = n, ncol = ncol(beta$data),
                    byrow = TRUE)
Y_fdata <- beta * X + epsilon

# FPCR
fpcr_4 <- flm_est(X = X, Y = Y_fdata, est_method = "fpcr")
plot(beta, col = 2)
lines(beta$argvals, drop(fpcr_4$Beta_hat))


[Package goffda version 0.1.2 Index]