sfpl.kNN.fit {fsemipar} | R Documentation |
SFPLM regularised fit using kNN estimation
Description
This function fits a sparse semi-functional partial linear model (SFPLM). It employs a penalised least-squares regularisation procedure, integrated with nonparametric kNN estimation using Nadaraya-Watson weights.
The procedure utilises an objective criterion (criterion
) to select both the bandwidth (h.opt
) and the regularisation parameter (lambda.opt
).
Usage
sfpl.kNN.fit(x, z, y, semimetric = "deriv", q = NULL, knearest = NULL,
min.knn = 2, max.knn = NULL, step = NULL, range.grid = NULL,
kind.of.kernel = "quad", nknot = NULL, lambda.min = NULL, lambda.min.h = NULL,
lambda.min.l = NULL, factor.pn = 1, nlambda = 100, lambda.seq = NULL,
vn = ncol(z), nfolds = 10, seed = 123, criterion = "GCV", penalty = "grSCAD",
max.iter = 1000)
Arguments
x |
Matrix containing the observations of the functional covariate (functional nonparametric component), collected by row. |
z |
Matrix containing the observations of the scalar covariates (linear component), collected by row. |
y |
Vector containing the scalar response. |
semimetric |
Semi-metric function. Only |
q |
Order of the derivative (if |
knearest |
Vector of positive integers containing the sequence in which the number of nearest neighbours |
min.knn |
A positive integer that represents the minimum value in the sequence for selecting the number of nearest neighbours |
max.knn |
A positive integer that represents the maximum value in the sequence for selecting number of nearest neighbours |
step |
A positive integer used to construct the sequence of k-nearest neighbours as follows: |
range.grid |
Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate |
kind.of.kernel |
The type of kernel function used. Currently, only Epanechnikov kernel ( |
nknot |
Positive integer indicating the number of interior knots for the B-spline expansion of the functional covariate. The default value is |
lambda.min |
The smallest value for lambda (i.e. the lower endpoint of the sequence in which |
lambda.min.h |
The lower endpoint of the sequence in which |
lambda.min.l |
The lower endpoint of the sequence in which |
factor.pn |
Positive integer used to set |
nlambda |
Positive integer indicating the number of values in the sequence from which |
lambda.seq |
Sequence of values in which |
vn |
Positive integer or vector of positive integers indicating the number of groups of consecutive variables to be penalised together. The default value is |
nfolds |
Number of cross-validation folds (used when |
seed |
You may set the seed for the random number generator to ensure reproducible results (applicable when |
criterion |
The criterion used to select the tuning and regularisation parameter: |
penalty |
The penalty function applied in the penalised least-squares procedure. Currently, only "grLasso" and "grSCAD" are implemented. The default is "grSCAD". |
max.iter |
Maximum number of iterations allowed across the entire path. The default value is 1000. |
Details
The sparse semi-functional partial linear model (SFPLM) is given by the expression:
Y_i = Z_{i1}\beta_{01} + \dots + Z_{ip_n}\beta_{0p_n} + m(X_i) + \varepsilon_i,\ \ \ i = 1, \dots, n,
where Y_i
denotes a scalar response, Z_{i1}, \dots, Z_{ip_n}
are real random covariates, and X_i
is a functional random covariate valued in a semi-metric space \mathcal{H}
. In this equation,
\mathbf{\beta}_0 = (\beta_{01}, \dots, \beta_{0p_n})^{\top}
and m(\cdot)
represent a vector of unknown real parameters and an unknown smooth real-valued function, respectively. Additionally, \varepsilon_i
is the random error.
In this function, the SFPLM is fitted using a penalised least-squares approach. The approach involves transforming the SFPLM into a linear model by extracting from Y_i
and Z_{ij}
(j = 1, \ldots, p_n
) the effect of the functional covariate X_i
using functional nonparametric regression (for details, see Ferraty and Vieu, 2006). This transformation is achieved using kNN estimation with Nadaraya-Watson weights.
An approximate linear model is then obtained:
\widetilde{\mathbf{Y}}\approx\widetilde{\mathbf{Z}}\mathbf{\beta}_0+\mathbf{\varepsilon},
and the penalised least-squares procedure is applied to this model by minimising
\mathcal{Q}\left(\mathbf{\beta}\right)=\frac{1}{2}\left(\widetilde{\mathbf{Y}}-\widetilde{\mathbf{Z}}\mathbf{\beta}\right)^{\top}\left(\widetilde{\mathbf{Y}}-\widetilde{\mathbf{Z}}\mathbf{\beta}\right)+n\sum_{j=1}^{p_n}\mathcal{P}_{\lambda_{j_n}}\left(|\beta_j|\right), \quad (1)
where \mathbf{\beta} = (\beta_1, \ldots, \beta_{p_n})^{\top}, \ \mathcal{P}_{\lambda_{j_n}}(\cdot)
is a penalty function (specified in the argument penalty
) and \lambda_{j_n} > 0
is a tuning parameter.
To reduce the number of tuning parameters, \lambda_j
, to be selected for each sample, we consider \lambda_j = \lambda \widehat{\sigma}_{\beta_{0,j,OLS}}
, where \beta_{0,j,OLS}
denotes the OLS estimate of \beta_{0,j}
and \widehat{\sigma}_{\beta_{0,j,OLS}}
is the estimated standard deviation. Both \lambda
and k
(in the kNN estimation) are selected using the objective criterion specified in the argument criterion
.
Finally, after estimating \mathbf{\beta}_0
by minimising (1), we address the estimation of the nonlinear function m(\cdot)
.
For this, we again employ the kNN procedure with Nadaraya-Watson weights to smooth the partial residuals Y_i - \mathbf{Z}_i^{\top}\widehat{\mathbf{\beta}}
.
For further details on the estimation procedure of the sparse SFPLM, see Aneiros et al. (2015).
Remark: It should be noted that if we set lambda.seq
to 0
, we can obtain the non-penalised estimation of the model, i.e. the OLS estimation. Using lambda.seq
with a value \not= 0
is advisable when suspecting the presence of irrelevant variables.
Value
call |
The matched call. |
fitted.values |
Estimated scalar response. |
residuals |
Differences between |
beta.est |
Estimate of |
indexes.beta.nonnull |
Indexes of the non-zero |
k.opt |
Selected number of nearest neighbours. |
lambda.opt |
Selected value of lambda. |
IC |
Value of the criterion function considered to select both |
vn.opt |
Selected value of |
... |
Author(s)
German Aneiros Perez german.aneiros@udc.es
Silvia Novo Diaz snovo@est-econ.uc3m.es
References
Aneiros, G., Ferraty, F., Vieu, P. (2015) Variable selection in partial linear regression with functional covariate. Statistics, 49, 1322–1347, doi:10.1080/02331888.2014.998675.
See Also
See also predict.sfpl.kNN
and plot.sfpl.kNN
.
Alternative method sfpl.kernel.fit
.
Examples
data("Tecator")
y<-Tecator$fat
X<-Tecator$absor.spectra
z1<-Tecator$protein
z2<-Tecator$moisture
#Quadratic, cubic and interaction effects of the scalar covariates.
z.com<-cbind(z1,z2,z1^2,z2^2,z1^3,z2^3,z1*z2)
train<-1:160
#SFPLM fit.
ptm=proc.time()
fit<-sfpl.kNN.fit(y=y[train],x=X[train,], z=z.com[train,],q=2, max.knn=20,
lambda.min.l=0.01, criterion="BIC",
range.grid=c(850,1050), nknot=20, max.iter=5000)
proc.time()-ptm
#Results
fit
names(fit)