sfpl.kernel.fit {fsemipar} | R Documentation |
SFPLM regularised fit using kernel estimation
Description
This function fits a sparse semi-functional partial linear model (SFPLM). It employs a penalised least-squares regularisation procedure, integrated with nonparametric kernel estimation using Nadaraya-Watson weights.
The procedure utilises an objective criterion (criterion
) to select both the bandwidth (h.opt
) and the regularisation parameter (lambda.opt
).
Usage
sfpl.kernel.fit(x, z, y, semimetric = "deriv", q = NULL, min.q.h = 0.05,
max.q.h = 0.5, h.seq = NULL, num.h = 10, range.grid = NULL,
kind.of.kernel = "quad", nknot = NULL, lambda.min = NULL, lambda.min.h = NULL,
lambda.min.l = NULL, factor.pn = 1, nlambda = 100, lambda.seq = NULL,
vn = ncol(z), nfolds = 10, seed = 123, criterion = "GCV", penalty = "grSCAD",
max.iter = 1000)
Arguments
x |
Matrix containing the observations of the functional covariate (functional nonparametric component), collected by row. |
z |
Matrix containing the observations of the scalar covariates (linear component), collected by row. |
y |
Vector containing the scalar response. |
semimetric |
Semi-metric function. Only |
q |
Order of the derivative (if |
min.q.h |
Minimum quantile order of the distances between curves, which are computed using the projection semi-metric. This value determines the lower endpoint of the range from which the bandwidth is selected. The default is 0.05. |
max.q.h |
Maximum quantile order of the distances between curves, which are computed using the projection semi-metric. This value determines the upper endpoint of the range from which the bandwidth is selected. The default is 0.5. |
h.seq |
Vector containing the sequence of bandwidths. The default is a sequence of |
num.h |
Positive integer indicating the number of bandwidths in the grid. The default is 10. |
range.grid |
Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate |
kind.of.kernel |
The type of kernel function used. Currently, only Epanechnikov kernel ( |
nknot |
Positive integer indicating the number of interior knots for the B-spline expansion of the functional covariate. The default value is |
lambda.min |
The smallest value for lambda (i.e. the lower endpoint of the sequence in which |
lambda.min.h |
The lower endpoint of the sequence in which |
lambda.min.l |
The lower endpoint of the sequence in which |
factor.pn |
Positive integer used to set |
nlambda |
Positive integer indicating the number of values in the sequence from which |
lambda.seq |
Sequence of values in which |
vn |
Positive integer or vector of positive integers indicating the number of groups of consecutive variables to be penalised together. The default value is |
nfolds |
Number of cross-validation folds (used when |
seed |
You may set the seed for the random number generator to ensure reproducible results (applicable when |
criterion |
The criterion used to select the tuning and regularisation parameter: |
penalty |
The penalty function applied in the penalised least-squares procedure. Currently, only "grLasso" and "grSCAD" are implemented. The default is "grSCAD". |
max.iter |
Maximum number of iterations allowed across the entire path. The default value is 1000. |
Details
The sparse semi-functional partial linear model (SFPLM) is given by the expression:
Y_i = Z_{i1}\beta_{01} + \dots + Z_{ip_n}\beta_{0p_n} + m(X_i) + \varepsilon_i,\ \ \ i = 1, \dots, n,
where Y_i
denotes a scalar response, Z_{i1}, \dots, Z_{ip_n}
are real random covariates, and X_i
is a functional random covariate valued in a semi-metric space \mathcal{H}
. In this equation,
\mathbf{\beta}_0 = (\beta_{01}, \dots, \beta_{0p_n})^{\top}
and m(\cdot)
represent a vector of unknown real parameters and an unknown smooth real-valued function, respectively. Additionally, \varepsilon_i
is the random error.
In this function, the SFPLM is fitted using a penalised least-squares approach. The approach involves transforming the SFPLM into a linear model by extracting from Y_i
and Z_{ij}
(j = 1, \ldots, p_n
) the effect of the functional covariate X_i
using functional nonparametric regression (for details, see Ferraty and Vieu, 2006). This transformation is achieved using kernel estimation with Nadaraya-Watson weights.
An approximate linear model is then obtained:
\widetilde{\mathbf{Y}}\approx\widetilde{\mathbf{Z}}\mathbf{\beta}_0+\mathbf{\varepsilon},
and the penalised least-squares procedure is applied to this model by minimising
\mathcal{Q}\left(\mathbf{\beta}\right)=\frac{1}{2}\left(\widetilde{\mathbf{Y}}-\widetilde{\mathbf{Z}}\mathbf{\beta}\right)^{\top}\left(\widetilde{\mathbf{Y}}-\widetilde{\mathbf{Z}}\mathbf{\beta}\right)+n\sum_{j=1}^{p_n}\mathcal{P}_{\lambda_{j_n}}\left(|\beta_j|\right), \quad (1)
where \mathbf{\beta} = (\beta_1, \ldots, \beta_{p_n})^{\top}, \ \mathcal{P}_{\lambda_{j_n}}(\cdot)
is a penalty function (specified in the argument penalty
) and \lambda_{j_n} > 0
is a tuning parameter.
To reduce the number of tuning parameters, \lambda_j
, to be selected for each sample, we consider \lambda_j = \lambda \widehat{\sigma}_{\beta_{0,j,OLS}}
, where \beta_{0,j,OLS}
denotes the OLS estimate of \beta_{0,j}
and \widehat{\sigma}_{\beta_{0,j,OLS}}
is the estimated standard deviation. Both \lambda
and h
(in the kernel estimation) are selected using the objective criterion specified in the argument criterion
.
Finally, after estimating \mathbf{\beta}_0
by minimising (1), we address the estimation of the nonlinear function m(\cdot)
.
For this, we again employ the kernel procedure with Nadaraya-Watson weights to smooth the partial residuals Y_i - \mathbf{Z}_i^{\top}\widehat{\mathbf{\beta}}
.
For further details on the estimation procedure of the sparse SFPLM, see Aneiros et al. (2015).
Remark: It should be noted that if we set lambda.seq
to 0
, we can obtain the non-penalised estimation of the model, i.e. the OLS estimation. Using lambda.seq
with a value \not= 0
is advisable when suspecting the presence of irrelevant variables.
Value
call |
The matched call. |
fitted.values |
Estimated scalar response. |
residuals |
Differences between |
beta.est |
Estimate of |
indexes.beta.nonnull |
Indexes of the non-zero |
h.opt |
Selected bandwidth. |
lambda.opt |
Selected value of lambda. |
IC |
Value of the criterion function considered to select |
h.min.opt.max.mopt |
|
vn.opt |
Selected value of |
... |
Author(s)
German Aneiros Perez german.aneiros@udc.es
Silvia Novo Diaz snovo@est-econ.uc3m.es
References
Aneiros, G., Ferraty, F., Vieu, P. (2015) Variable selection in partial linear regression with functional covariate. Statistics, 49, 1322–1347, doi:10.1080/02331888.2014.998675.
Ferraty, F. and Vieu, P. (2006) Nonparametric Functional Data Analysis. Springer Series in Statistics, New York.
See Also
See also predict.sfpl.kernel
and plot.sfpl.kernel
.
Alternative method sfpl.kNN.fit
.
Examples
data("Tecator")
y<-Tecator$fat
X<-Tecator$absor.spectra
z1<-Tecator$protein
z2<-Tecator$moisture
#Quadratic, cubic and interaction effects of the scalar covariates.
z.com<-cbind(z1,z2,z1^2,z2^2,z1^3,z2^3,z1*z2)
train<-1:160
#SFPLM fit.
ptm=proc.time()
fit<-sfpl.kernel.fit(x=X[train,], z=z.com[train,], y=y[train],q=2,
max.q.h=0.35, lambda.min.l=0.01,
max.iter=5000, criterion="BIC", nknot=20)
proc.time()-ptm
#Results
fit
names(fit)