fsim.kNN.fit.optim {fsemipar}R Documentation

Functional single-index model fit using kNN estimation and iterative LOOCV minimisation

Description

This function fits a functional single-index model (FSIM) between a functional covariate and a scalar response. It employs kNN estimation with Nadaraya-Watson weights and uses B-spline expansions to represent curves and eligible functional indexes.

The function also utilises the leave-one-out cross-validation (LOOCV) criterion to select the bandwidth (h.opt) and the coefficients of the functional index in the spline basis (theta.est). It performs an iterative minimisation of the LOOCV objective function, starting from an initial set of coefficients (gamma) for the functional index.

Usage

fsim.kNN.fit.optim(x, y, order.Bspline = 3, nknot.theta = 3, gamma = NULL, 
knearest = NULL, min.knn = 2, max.knn = NULL,  step = NULL, 
kind.of.kernel = "quad", range.grid = NULL, nknot = NULL, threshold = 0.005)

Arguments

x

Matrix containing the observations of the functional covariate (i.e. curves) collected by row.

y

Vector containing the scalar response.

order.Bspline

Positive integer giving the order of the B-spline basis functions. This is the number of coefficients in each piecewise polynomial segment. The default is 3

nknot.theta

Positive integer indicating the number of regularly spaced interior knots in the B-spline expansion of \theta_0. The default is 3.

gamma

Vector indicating the initial coefficients for the functional index used in the iterative procedure. By default, it is a vector of ones. The size of the vector is determined by the sum nknot.theta+order.Bspline.

knearest

Vector of positive integers that defines the sequence within which the optimal number of nearest neighbours k.opt is selected. If knearest=NULL, then knearest <- seq(from =min.knn, to = max.knn, by = step).

min.knn

A positive integer that represents the minimum value in the sequence for selecting the number of nearest neighbours k.opt. This value should be less than the sample size. The default is 2.

max.knn

A positive integer that represents the maximum value in the sequence for selecting number of nearest neighbours k.opt. This value should be less than the sample size. The default is max.knn <- n%/%5.

step

A positive integer used to construct the sequence of k-nearest neighbours as follows: min.knn, min.knn + step, min.knn + 2*step, min.knn + 3*step,.... The default value for step is step<-ceiling(n/100).

kind.of.kernel

The type of kernel function used. Currently, only Epanechnikov kernel ("quad") is available.

range.grid

Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate x are evaluated (i.e. the range of the discretisation). If range.grid=NULL, then range.grid=c(1,p) is considered, where p is the discretisation size of x (i.e. ncol(x)).

nknot

Positive integer indicating the number of regularly spaced interior knots for the B-spline expansion of the functional covariate. The default value is (p - order.Bspline - 1)%/%2.

threshold

The convergence threshold for the LOOCV function (scaled by the variance of the response). The default is 5e-3.

Details

The functional single-index model (FSIM) is given by the expression:

Y_i=r(\langle\theta_0,X_i\rangle)+\varepsilon_i, \quad i=1,\dots,n,

where Y_i denotes a scalar response, X_i is a functional covariate valued in a separable Hilbert space \mathcal{H} with an inner product \langle \cdot, \cdot\rangle. The term \varepsilon denotes the random error, \theta_0 \in \mathcal{H} is the unknown functional index and r(\cdot) denotes the unknown smooth link function.

The FSIM is fitted using the kNN estimator

\widehat{r}_{k,\hat{\theta}}(x)=\sum_{i=1}^nw_{n,k,\hat{\theta}}(x,X_i)Y_i, \quad \forall x\in\mathcal{H},

with Nadaraya-Watson weights

w_{n,k,\hat{\theta}}(x,X_i)=\frac{K\left(H_{k,x,\hat{\theta}}^{-1}d_{\hat{\theta}}\left(X_i,x\right)\right)}{\sum_{i=1}^nK\left(H_{k,x,\hat{\theta}}^{-1}d_{\hat{\theta}}\left(X_i,x\right)\right)},

where

The procedure requires the estimation of the function-parameter \theta_0. Therefore, we use B-spline expansions to represent curves (dimension nknot+order.Bspline) and eligible functional indexes (dimension nknot.theta+order.Bspline). We obtain the estimated coefficients of \theta_0 in the spline basis (theta.est) and the selected number of neighbours (k.opt) by minimising the LOOCV criterion. This function performs an iterative minimisation procedure, starting from an initial set of coefficients (gamma) for the functional index. Given a functional index, the optimal number of neighbours according to the LOOCV criterion is selected. For a given number of neighbours, the minimisation in the functional index is performed using the R function optim. The procedure is iterated until convergence. For details, see Ferraty et al. (2013).

Value

call

The matched call.

fitted.values

Estimated scalar response.

residuals

Differences between y and the fitted.values.

theta.est

Coefficients of \hat{\theta} in the B-spline basis: a vector of length(order.Bspline+nknot.theta).

k.opt

Selected number of neighbours.

r.squared

Coefficient of determination.

var.res

Redidual variance.

df

Residual degrees of freedom.

CV.opt

Minimum value of the LOOCV function, i.e. the value of LOOCV for theta.est and k.opt.

err

Value of the LOOCV function divided by var(y) for each interaction.

H

Hat matrix.

k.seq

Sequence of eligible values for k.

CV.hseq

CV values for each k.

...

Author(s)

German Aneiros Perez german.aneiros@udc.es

Silvia Novo Diaz snovo@est-econ.uc3m.es

References

Ferraty, F., Goia, A., Salinelli, E., and Vieu, P. (2013) Functional projection pursuit regression. Test, 22, 293–320, doi:10.1007/s11749-012-0306-2.

Novo S., Aneiros, G., and Vieu, P., (2019) Automatic and location-adaptive estimation in functional single–index regression. Journal of Nonparametric Statistics, 31(2), 364–392, doi:10.1080/10485252.2019.1567726.

See Also

See also predict.fsim.kNN and plot.fsim.kNN.

Alternative procedures fsim.kernel.fit.optim, fsim.kernel.fit and fsim.kNN.fit.

Examples


data(Tecator)
y<-Tecator$fat
X<-Tecator$absor.spectra2

#FSIM fit.
ptm<-proc.time()
fit<-fsim.kNN.fit.optim(y=y[1:160],x=X[1:160,],max.knn=20,nknot.theta=4,nknot=20,
range.grid=c(850,1050))
proc.time()-ptm
fit
names(fit)



[Package fsemipar version 1.1.1 Index]