| sfplsim.kernel.fit {fsemipar} | R Documentation | 
SFPLSIM regularised fit using kernel estimation
Description
This function fits a sparse semi-functional partial linear single-index (SFPLSIM). It employs a penalised least-squares regularisation procedure, integrated with nonparametric kernel estimation using Nadaraya-Watson weights.
The function uses B-spline expansions to represent curves and eligible functional indexes.  It also utilises an objective criterion (criterion) to select both the bandwidth (h.opt) and the regularisation parameter (lambda.opt).
Usage
sfplsim.kernel.fit(x, z, y, seed.coeff = c(-1, 0, 1), order.Bspline = 3, 
nknot.theta = 3, min.q.h = 0.05, max.q.h = 0.5, h.seq = NULL, num.h = 10, 
range.grid = NULL, kind.of.kernel = "quad", nknot = NULL, lambda.min = NULL,
lambda.min.h = NULL, lambda.min.l = NULL, factor.pn = 1, nlambda = 100, 
lambda.seq = NULL, vn = ncol(z), nfolds = 10, seed = 123, criterion = "GCV",
penalty = "grSCAD", max.iter = 1000, n.core = NULL)
Arguments
x | 
 Matrix containing the observations of the functional covariate (functional single-index component), collected by row.  | 
z | 
 Matrix containing the observations of the scalar covariates (linear component), collected by row.  | 
y | 
 Vector containing the scalar response.  | 
seed.coeff | 
 Vector of initial values used to  build the set   | 
order.Bspline | 
 Positive integer giving the order of the B-spline basis functions. This is the number of coefficients in each piecewise polynomial segment. The default is 3.  | 
nknot.theta | 
 Positive integer indicating the number of regularly spaced interior knots in the B-spline expansion of   | 
min.q.h | 
 Minimum quantile order of the distances between curves, which are computed using the projection semi-metric. This value determines the lower endpoint of the range from which the bandwidth is selected. The default is 0.05.  | 
max.q.h | 
 Maximum quantile order of the distances between curves, which are computed using the projection semi-metric. This value determines the upper endpoint of the range from which the bandwidth is selected. The default is 0.5.  | 
h.seq | 
 Vector containing the sequence of bandwidths. The default is a sequence of   | 
num.h | 
 Positive integer indicating the number of bandwidths in the grid. The default is 10.  | 
range.grid | 
 Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate   | 
kind.of.kernel | 
 The type of kernel function used. Currently, only Epanechnikov kernel (  | 
nknot | 
 Positive integer indicating the number of interior knots for the B-spline expansion of the functional covariate. The default value is   | 
lambda.min | 
 The smallest value for lambda (i. e., the lower endpoint  of the sequence in which   | 
lambda.min.h | 
 The lower endpoint of the sequence in which   | 
lambda.min.l | 
 The lower endpoint of the sequence in which   | 
factor.pn | 
 Positive integer used to set   | 
nlambda | 
 Positive integer indicating the number of values in the sequence from which   | 
lambda.seq | 
 Sequence of values in which   | 
vn | 
 Positive integer or vector of positive integers indicating the number of groups of consecutive variables to be penalised together. The default value is   | 
nfolds | 
 Number of cross-validation folds (used when   | 
seed | 
 You may set the seed for the random number generator to ensure reproducible results (applicable when   | 
criterion | 
 The criterion used to select the tuning and regularisation parameter:   | 
penalty | 
 The penalty function applied in the penalised least-squares procedure. Currently, only "grLasso" and "grSCAD" are implemented. The default is "grSCAD".  | 
max.iter | 
 Maximum number of iterations allowed across the entire path. The default value is 1000.  | 
n.core | 
 Number of CPU cores designated for parallel execution. The default is   | 
Details
The sparse semi-functional partial linear single-index model (SFPLSIM) is given by the expression:
Y_i=Z_{i1}\beta_{01}+\dots+Z_{ip_n}\beta_{0p_n}+r(\left<\theta_0,X_i\right>)+\varepsilon_i\ \ \ i=1,\dots,n,
where Y_i denotes a scalar response, Z_{i1},\dots,Z_{ip_n} are real random covariates and X_i is a functional random covariate valued in a separable Hilbert space \mathcal{H} with inner product \left\langle \cdot, \cdot \right\rangle. In this equation,
\mathbf{\beta}_0=(\beta_{01},\dots,\beta_{0p_n})^{\top}, \theta_0\in\mathcal{H} and r(\cdot) are a vector of unknown real parameters, an unknown functional direction and an unknown smooth real-valued function, respectively. In addition, \varepsilon_i is the random error.
The sparse SFPLSIM is fitted using the penalised least-squares approach. The first step is to transform the SSFPLSIM into a linear model by extracting from Y_i and Z_{ij} (j=1,\ldots,p_n) the effect of the functional covariate X_i using functional single-index regression.  This transformation is achieved using nonparametric kernel estimation (see, for details, the documentation of the function fsim.kernel.fit).
An approximate linear model is then obtained:
\widetilde{\mathbf{Y}}_{\theta_0}\approx\widetilde{\mathbf{Z}}_{\theta_0}\mathbf{\beta}_0+\mathbf{\varepsilon},
and the penalised least-squares procedure is applied to this model by minimising over the pair (\mathbf{\beta},\theta)
\mathcal{Q}\left(\mathbf{\beta},\theta\right)=\frac{1}{2}\left(\widetilde{\mathbf{Y}}_{\theta}-\widetilde{\mathbf{Z}}_{\theta}\mathbf{\beta}\right)^{\top}\left(\widetilde{\mathbf{Y}}_{\theta}-\widetilde{\mathbf{Z}}_{\theta}\mathbf{\beta}\right)+n\sum_{j=1}^{p_n}\mathcal{P}_{\lambda_{j_n}}\left(|\beta_j|\right), \quad (1)
where \mathbf{\beta}=(\beta_1,\ldots,\beta_{p_n})^{\top}, \ \mathcal{P}_{\lambda_{j_n}}\left(\cdot\right) is a penalty function (specified in the argument penalty) and \lambda_{j_n} > 0 is a tuning parameter.
To reduce  the quantity of tuning parameters, \lambda_j, to be selected for each sample, we consider \lambda_j = \lambda \widehat{\sigma}_{\beta_{0,j,OLS}}, where \beta_{0,j,OLS} denotes the OLS estimate of \beta_{0,j} and \widehat{\sigma}_{\beta_{0,j,OLS}} is the estimated standard deviation. Both \lambda and h (in the kernel estimation) are selected using the objetive criterion specified in the argument criterion.
In addition, the function uses a B-spline representation to construct a set  \Theta_n of eligible functional indexes \theta. The dimension of the B-spline basis is order.Bspline+nknot.theta and the set of eligible coefficients is obtained by calibrating (to ensure the identifiability of the model) the set of initial coefficients given in seed.coeff. The larger this set, the greater the size of \Theta_n. ue to the intensive computation required by our approach, a balance between the size of \Theta_n and the performance of the estimator is necessary. For that, Ait-Saidi et al. (2008) suggested considering order.Bspline=3 and seed.coeff=c(-1,0,1). For details on the construction of \Theta_n see Novo et al. (2019).
Finally, after estimating \mathbf{\beta}_0 and \theta_0 by minimising (1), we proceed to estimate the nonlinear function r_{\theta_0}(\cdot)\equiv r\left(\left<\theta_0,\cdot\right>\right).
For this purporse, we again apply the kernel procedure with Nadaraya-Watson weights to smooth the partial residuals Y_i-\mathbf{Z}_i^{\top}\widehat{\mathbf{\beta}}.
For further details on the estimation procedure of the SSFPLSIM, see Novo et al. (2021).
Remark: It should be noted that if we set lambda.seq to 0, we can obtain the non-penalised estimation of the model, i.e. the OLS estimation. Using lambda.seq with a value \not= 0 is advisable when suspecting the presence of irrelevant variables.
Value
call | 
 The matched call.  | 
fitted.values | 
 Estimated scalar response.  | 
residuals | 
 Differences between   | 
beta.est | 
 Estimate of   | 
theta.est | 
 Coefficients of   | 
indexes.beta.nonnull | 
 Indexes of the non-zero   | 
h.opt | 
 Selected bandwidth.  | 
lambda.opt | 
 Selected value of the penalisation parameter   | 
IC | 
 Value of the criterion function considered to select   | 
Q.opt | 
 Minimum value of the penalized criterion used to estimate   | 
Q | 
 Vector of dimension equal to the cardinal of   | 
m.opt | 
 Index of   | 
lambda.min.opt.max.mopt | 
 A grid of values in [  | 
lambda.min.opt.max.m | 
 A grid of values in [  | 
h.min.opt.max.mopt | 
 
  | 
h.min.opt.max.m | 
 For each   | 
h.seq.opt | 
 Sequence of eligible values for   | 
theta.seq.norm | 
 The vector   | 
vn.opt | 
 Selected value of   | 
... | 
Author(s)
German Aneiros Perez german.aneiros@udc.es
Silvia Novo Diaz snovo@est-econ.uc3m.es
References
Ait-Saidi, A., Ferraty, F., Kassa, R., and Vieu, P. (2008) Cross-validated estimations in the single-functional index model. Statistics, 42(6), 475–494, doi:10.1080/02331880801980377.
Novo S., Aneiros, G., and Vieu, P., (2019) Automatic and location-adaptive estimation in functional single-index regression. Journal of Nonparametric Statistics, 31(2), 364–392, doi:10.1080/10485252.2019.1567726.
Novo, S., Aneiros, G., and Vieu, P., (2021) Sparse semiparametric regression when predictors are mixture of functional and high-dimensional variables. TEST, 30, 481–504, doi:10.1007/s11749-020-00728-w.
Novo, S., Aneiros, G., and Vieu, P., (2021) A kNN procedure in semiparametric functional data analysis. Statistics and Probability Letters, 171, 109028, doi:10.1016/j.spl.2020.109028.
See Also
See also fsim.kernel.fit, predict.sfplsim.kernel and  plot.sfplsim.kernel
Alternative procedure sfplsim.kNN.fit.
Examples
data("Tecator")
y<-Tecator$fat
X<-Tecator$absor.spectra2
z1<-Tecator$protein       
z2<-Tecator$moisture
#Quadratic, cubic and interaction effects of the scalar covariates.
z.com<-cbind(z1,z2,z1^2,z2^2,z1^3,z2^3,z1*z2)
train<-1:160
#SSFPLSIM fit. Convergence errors for some theta are obtained.
ptm=proc.time()
fit<-sfplsim.kernel.fit(x=X[train,], z=z.com[train,], y=y[train],
      max.q.h=0.35,lambda.min.l=0.01,
      max.iter=5000, nknot.theta=4,criterion="BIC",nknot=20)
proc.time()-ptm
#Results
fit
names(fit)