fsim.kNN.fit {fsemipar} | R Documentation |
Functional single-index model fit using kNN estimation and joint LOOCV minimisation
Description
This function fits a functional single-index model (FSIM) between a functional covariate and a scalar response. It employs kNN estimation with Nadaraya-Watson weights and uses B-spline expansions to represent curves and eligible functional indexes.
The function also utilises the leave-one-out cross-validation (LOOCV) criterion to select the number of neighbours (k.opt
) and the coefficients of the functional index in the spline basis (theta.est
). It performs a joint minimisation of the LOOCV objective function in both the number of neighbours and the functional index.
Usage
fsim.kNN.fit(x, y, seed.coeff = c(-1, 0, 1), order.Bspline = 3, nknot.theta = 3,
knearest = NULL, min.knn = 2, max.knn = NULL, step = NULL,
kind.of.kernel = "quad", range.grid = NULL, nknot = NULL, n.core = NULL)
Arguments
x |
Matrix containing the observations of the functional covariate (i.e. curves) collected by row. |
y |
Vector containing the scalar response. |
seed.coeff |
Vector of initial values used to build the set |
order.Bspline |
Positive integer giving the order of the B-spline basis functions. This is the number of coefficients in each piecewise polynomial segment. The default is 3 |
nknot.theta |
Positive integer indicating the number of regularly spaced interior knots in the B-spline expansion of |
knearest |
Vector of positive integers that defines the sequence within which the optimal number of nearest neighbours |
min.knn |
A positive integer that represents the minimum value in the sequence for selecting the number of nearest neighbours |
max.knn |
A positive integer that represents the maximum value in the sequence for selecting number of nearest neighbours |
step |
A positive integer used to construct the sequence of k-nearest neighbours as follows: |
kind.of.kernel |
The type of kernel function used. Currently, only Epanechnikov kernel ( |
range.grid |
Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate |
nknot |
Positive integer indicating the number of interior knots for the B-spline expansion of the functional covariate. The default value is |
n.core |
Number of CPU cores designated for parallel execution.The default is |
Details
The functional single-index model (FSIM) is given by the expression:
Y_i=r(\langle\theta_0,X_i\rangle)+\varepsilon_i, \quad i=1,\dots,n,
where Y_i
denotes a scalar response,
X_i
is a functional covariate valued in a separable Hilbert space \mathcal{H}
with an inner product \langle \cdot, \cdot\rangle
. The term \varepsilon
denotes the random error, \theta_0 \in \mathcal{H}
is the unknown functional index and r(\cdot)
denotes the unknown smooth link function.
The FSIM is fitted using the kNN estimator
\widehat{r}_{k,\hat{\theta}}(x)=\sum_{i=1}^nw_{n,k,\hat{\theta}}(x,X_i)Y_i, \quad \forall x\in\mathcal{H},
with Nadaraya-Watson weights
w_{n,k,\hat{\theta}}(x,X_i)=\frac{K\left(H_{k,x,\hat{\theta}}^{-1}d_{\hat{\theta}}\left(X_i,x\right)\right)}{\sum_{i=1}^nK\left(H_{k,x,\hat{\theta}}^{-1}d_{\hat{\theta}}\left(X_i,x\right)\right)},
where
the positive integer
k
is a smoothing factor, representing the number of nearest neighbours.-
K
is a kernel function (see the argumentkind.of.kernel
). -
d_{\hat{\theta}}(x_1,x_2)=|\langle\hat{\theta},x_1-x_2\rangle|
is the projection semi-metric, computed usingsemimetric.projec
and\hat{\theta}
is an estimate of\theta_0
. -
H_{k,x,\hat{\theta}}=\min\{h\in R^+ \text{ such that } \sum_{i=1}^n1_{B_{\hat{\theta}}(x,h)}(X_i)=k\}
, where1_{B_{\hat{\theta}}(x,h)}(\cdot)
is the indicator function of the open ball defined by the projection semi-metric, with centrex\in\mathcal{H}
and radiush
.
The procedure requires the estimation of the function-parameter \theta_0
. Therefore, we use B-spline expansions to represent curves (dimension nknot+order.Bspline
) and eligible functional indexes (dimension nknot.theta+order.Bspline
). Then, we build a set \Theta_n
of eligible functional indexes by calibrating (to ensure the identifiability of the model) the set of initial coefficients given in seed.coeff
. The larger this set is, the greater the size of \Theta_n
. Since our approach requires intensive computation, a trade-off between the size of \Theta_n
and the performance of the estimator is necessary. For that, Ait-Saidi et al. (2008) suggested considering order.Bspline=3
and seed.coeff=c(-1,0,1)
. For details on the construction of \Theta_n
, see Novo et al. (2019).
We obtain the estimated coefficients of \theta_0
in the spline basis (theta.est
) and the selected number of neighbours (k.opt
) by minimising the LOOCV criterion. This function performs a joint minimisation in both parameters, the number of neighbours and the functional index, and supports parallel computation. To avoid parallel computation, we can set n.core=1
.
Value
call |
The matched call. |
fitted.values |
Estimated scalar response. |
residuals |
Differences between |
theta.est |
Coefficients of |
k.opt |
Selected number of nearest neighbours. |
r.squared |
Coefficient of determination. |
var.res |
Redidual variance. |
df |
Residual degrees of freedom. |
yhat.cv |
Predicted values for the scalar response using leave-one-out samples. |
CV.opt |
Minimum value of the CV function, i.e. the value of CV for |
CV.values |
Vector containing CV values for each functional index in |
H |
Hat matrix. |
m.opt |
Index of |
theta.seq.norm |
The vector |
k.seq |
Sequence of eligible values for |
... |
Author(s)
German Aneiros Perez german.aneiros@udc.es
Silvia Novo Diaz snovo@est-econ.uc3m.es
References
Ait-Saidi, A., Ferraty, F., Kassa, R., and Vieu, P. (2008) Cross-validated estimations in the single-functional index model, Statistics, 42(6), 475–494, doi:10.1080/02331880801980377.
Novo S., Aneiros, G., and Vieu, P., (2019) Automatic and location-adaptive estimation in functional single–index regression, Journal of Nonparametric Statistics, 31(2), 364–392, doi:10.1080/10485252.2019.1567726.
See Also
See also fsim.kNN.test
, predict.fsim.kNN
, plot.fsim.kNN
.
Alternative procedures fsim.kernel.fit
, fsim.kNN.fit.optim
and fsim.kernel.fit.optim
Examples
data(Tecator)
y<-Tecator$fat
X<-Tecator$absor.spectra2
#FSIM fit.
ptm<-proc.time()
fit<-fsim.kNN.fit(y=y[1:160],x=X[1:160,],max.knn=20,nknot.theta=4,nknot=20,
range.grid=c(850,1050))
proc.time()-ptm
fit
names(fit)