R: Functional single-index kNN predictor

fsim.kNN.test {fsemipar}

R Documentation

Functional single-index kNN predictor

Description

This function computes predictions for a functional single-index model (FSIM) with a scalar response, which is estimated using the Nadaraya-Watson kNN estimator. It requires a functional index (\theta), a global bandwidth (h), and the new observations of the functional covariate (x.test) as inputs.

Usage

fsim.kNN.test(x, y, x.test, y.test = NULL, theta, order.Bspline = 3, 
nknot.theta = 3, k = 4, kind.of.kernel = "quad", range.grid = NULL, 
nknot = NULL)

Arguments

`x`	Matrix containing the observations of the functional covariate in the training sample, collected by row.
`y`	Vector containing the scalar responses in the training sample.
`x.test`	Matrix containing the observations of the functional covariate in the the testing sample, collected by row.
`y.test`	(optional) Vector or matrix containing the scalar responses in the testing sample.
`theta`	Vector containing the coefficients of `\theta` in a B-spline basis, such that `length(theta)=order.Bspline+nknot.theta`
`nknot.theta`	Number of regularly spaced interior knots in the B-spline expansion of `\theta_0`. The default is 3.
`order.Bspline`	Order of the B-spline basis functions. This is the number of coefficients in each piecewise polynomial segment. The default is 3
`k`	The number of nearest neighbours. The default is 4.
`kind.of.kernel`	The type of kernel function used. Currently, only Epanechnikov kernel (`"quad"`) is available.
`range.grid`	Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate `x` are evaluated (i.e. the range of the discretisation). If `range.grid=NULL`, then `range.grid=c(1,p)` is considered, where `p` is the discretisation size of `x` (i.e. `ncol(x))`.
`nknot`	Number of regularly spaced interior knots for the B-spline expansion of the functional covariate. The default value is `(p - order.Bspline - 1)%/%2`.

Details

The functional single-index model (FSIM) is given by the expression:

Y_i=r(\langle\theta_0,X_i\rangle)+\varepsilon_i, \quad i=1,\dots,n,

where Y_i denotes a scalar response, X_i is a functional covariate valued in a separable Hilbert space \mathcal{H} with an inner product \langle \cdot, \cdot\rangle. The term \varepsilon denotes the random error, \theta_0 \in \mathcal{H} is the unknown functional index and r(\cdot) denotes the unknown smooth link function; n is the training sample size.

Given \theta \in \mathcal{H}, 1<k<n and a testing sample {X_j,\ j=1,\dots,n_{test}}, the predicted responses (see the value y.estimated.test) can be computed using the kNN procedure by means of

\widehat{r}_{k,\theta}(X_j)=\sum_{i=1}^nw_{n,k,\theta}(X_j,X_i)Y_i,\quad j=1,\dots,n_{test},

with Nadaraya-Watson weights

w_{n,k,\theta}(X_j,X_i)=\frac{K\left(H_{k,X_j,{\theta}}^{-1}d_{\theta}\left(X_i,X_j\right)\right)}{\sum_{i=1}^nK\left(H_{k,X_j,\theta}^{-1}d_{\theta}\left(X_i,X_j\right)\right)},

where

K is a kernel function (see the argument kind.of.kernel).
for x_1,x_2 \in \mathcal{H}, d_{\theta}(x_1,x_2)=|\langle\theta,x_1-x_2\rangle| is the projection semi-metric.
H_{k,x,\theta}=\min\left\{h\in R^+ \text{ such that } \sum_{i=1}^n1_{B_{\theta}(x,h)}(X_i)=k\right\}, where 1_{B_{\theta}(x,h)}(\cdot) is the indicator function of the open ball defined by the projection semi-metric, with centre x\in\mathcal{H} and radius h.

If the argument y.test is provided to the program (i. e. if(!is.null(y.test))), the function calculates the mean squared error of prediction (see the value MSE.test). This is computed as mean((y.test-y.estimated.test)^2).

Value

`y.estimated.test`	Predicted responses.
`MSE.test`	Mean squared error between predicted and observed responses in the testing sample.

Author(s)

German Aneiros Perez german.aneiros@udc.es

Silvia Novo Diaz snovo@est-econ.uc3m.es

References

Novo S., Aneiros, G., and Vieu, P., (2019) Automatic and location-adaptive estimation in functional single–index regression. Journal of Nonparametric Statistics, 31(2), 364–392, doi:10.1080/10485252.2019.1567726.

Examples


data(Tecator)
y<-Tecator$fat
X<-Tecator$absor.spectra2


train<-1:160
test<-161:215

#FSIM fit. 
ptm<-proc.time()
fit<-fsim.kNN.fit(y=y[train],x=X[train,],max.knn=20,nknot.theta=4,nknot=20,
      range.grid=c(850,1050))
proc.time()-ptm
fit

#FSIM prediction
test<-fsim.kNN.test(y=y[train],x=X[train,],x.test=X[test,],y.test=y[test],
        theta=fit$theta.est,k=fit$k.opt,nknot.theta=4,nknot=20,
        range.grid=c(850,1050))

#MSEP
test$MSE.test

[Package fsemipar version 1.1.1 Index]