fsim.kNN.fit {fsemipar}R Documentation

Functional single-index model fit using kNN estimation and joint LOOCV minimisation

Description

This function fits a functional single-index model (FSIM) between a functional covariate and a scalar response. It employs kNN estimation with Nadaraya-Watson weights and uses B-spline expansions to represent curves and eligible functional indexes.

The function also utilises the leave-one-out cross-validation (LOOCV) criterion to select the number of neighbours (k.opt) and the coefficients of the functional index in the spline basis (theta.est). It performs a joint minimisation of the LOOCV objective function in both the number of neighbours and the functional index.

Usage

fsim.kNN.fit(x, y, seed.coeff = c(-1, 0, 1), order.Bspline = 3, nknot.theta = 3,
knearest = NULL, min.knn = 2, max.knn = NULL,  step = NULL, 
kind.of.kernel = "quad", range.grid = NULL, nknot = NULL, n.core = NULL)

Arguments

x

Matrix containing the observations of the functional covariate (i.e. curves) collected by row.

y

Vector containing the scalar response.

seed.coeff

Vector of initial values used to build the set Θn\Theta_n (see section Details). The coefficients for the B-spline representation of each eligible functional index θΘn\theta \in \Theta_n are obtained from seed.coeff. The default is c(-1,0,1).

order.Bspline

Positive integer giving the order of the B-spline basis functions. This is the number of coefficients in each piecewise polynomial segment. The default is 3

nknot.theta

Positive integer indicating the number of regularly spaced interior knots in the B-spline expansion of θ0\theta_0. The default is 3.

knearest

Vector of positive integers that defines the sequence within which the optimal number of nearest neighbours k.opt is selected. If knearest=NULL, then knearest <- seq(from =min.knn, to = max.knn, by = step).

min.knn

A positive integer that represents the minimum value in the sequence for selecting the number of nearest neighbours k.opt. This value should be less than the sample size. The default is 2.

max.knn

A positive integer that represents the maximum value in the sequence for selecting number of nearest neighbours k.opt. This value should be less than the sample size. The default is max.knn <- n%/%5.

step

A positive integer used to construct the sequence of k-nearest neighbours as follows: min.knn, min.knn + step, min.knn + 2*step, min.knn + 3*step,.... The default value for step is step<-ceiling(n/100).

kind.of.kernel

The type of kernel function used. Currently, only Epanechnikov kernel ("quad") is available.

range.grid

Vector of length 2 containing the endpoints of the grid at which the observations of the functional covariate x are evaluated (i.e. the range of the discretisation). If range.grid=NULL, then range.grid=c(1,p) is considered, where p is the discretisation size of x (i.e. ncol(x)).

nknot

Positive integer indicating the number of interior knots for the B-spline expansion of the functional covariate. The default value is (p - order.Bspline - 1)%/%2.

n.core

Number of CPU cores designated for parallel execution.The default is n.core<-availableCores(omit=1).

Details

The functional single-index model (FSIM) is given by the expression:

Yi=r(θ0,Xi)+εi,i=1,,n,Y_i=r(\langle\theta_0,X_i\rangle)+\varepsilon_i, \quad i=1,\dots,n,

where YiY_i denotes a scalar response, XiX_i is a functional covariate valued in a separable Hilbert space H\mathcal{H} with an inner product ,\langle \cdot, \cdot\rangle. The term ε\varepsilon denotes the random error, θ0H\theta_0 \in \mathcal{H} is the unknown functional index and r()r(\cdot) denotes the unknown smooth link function.

The FSIM is fitted using the kNN estimator

r^k,θ^(x)=i=1nwn,k,θ^(x,Xi)Yi,xH, \widehat{r}_{k,\hat{\theta}}(x)=\sum_{i=1}^nw_{n,k,\hat{\theta}}(x,X_i)Y_i, \quad \forall x\in\mathcal{H},

with Nadaraya-Watson weights

wn,k,θ^(x,Xi)=K(Hk,x,θ^1dθ^(Xi,x))i=1nK(Hk,x,θ^1dθ^(Xi,x)), w_{n,k,\hat{\theta}}(x,X_i)=\frac{K\left(H_{k,x,\hat{\theta}}^{-1}d_{\hat{\theta}}\left(X_i,x\right)\right)}{\sum_{i=1}^nK\left(H_{k,x,\hat{\theta}}^{-1}d_{\hat{\theta}}\left(X_i,x\right)\right)},

where

The procedure requires the estimation of the function-parameter θ0\theta_0. Therefore, we use B-spline expansions to represent curves (dimension nknot+order.Bspline) and eligible functional indexes (dimension nknot.theta+order.Bspline). Then, we build a set Θn\Theta_n of eligible functional indexes by calibrating (to ensure the identifiability of the model) the set of initial coefficients given in seed.coeff. The larger this set is, the greater the size of Θn\Theta_n. Since our approach requires intensive computation, a trade-off between the size of Θn\Theta_n and the performance of the estimator is necessary. For that, Ait-Saidi et al. (2008) suggested considering order.Bspline=3 and seed.coeff=c(-1,0,1). For details on the construction of Θn\Theta_n, see Novo et al. (2019).

We obtain the estimated coefficients of θ0\theta_0 in the spline basis (theta.est) and the selected number of neighbours (k.opt) by minimising the LOOCV criterion. This function performs a joint minimisation in both parameters, the number of neighbours and the functional index, and supports parallel computation. To avoid parallel computation, we can set n.core=1.

Value

call

The matched call.

fitted.values

Estimated scalar response.

residuals

Differences between y and the fitted.values

theta.est

Coefficients of θ^\hat{\theta} in the B-spline basis: a vector of length(order.Bspline+nknot.theta).

k.opt

Selected number of nearest neighbours.

r.squared

Coefficient of determination.

var.res

Redidual variance.

df

Residual degrees of freedom.

yhat.cv

Predicted values for the scalar response using leave-one-out samples.

CV.opt

Minimum value of the CV function, i.e. the value of CV for theta.est and k.opt.

CV.values

Vector containing CV values for each functional index in Θn\Theta_n and the value of kk that minimises the CV for such index (i.e. CV.values[j] contains the value of the CV function corresponding to theta.seq.norm[j,] and the best value of the k.seq for this functional index according to the CV criterion).

H

Hat matrix.

m.opt

Index of θ^\hat{\theta} in the set Θn\Theta_n.

theta.seq.norm

The vector theta.seq.norm[j,] contains the coefficientes in the B-spline basis of the jth functional index in Θn\Theta_n.

k.seq

Sequence of eligible values for kk.

...

Author(s)

German Aneiros Perez german.aneiros@udc.es

Silvia Novo Diaz snovo@est-econ.uc3m.es

References

Ait-Saidi, A., Ferraty, F., Kassa, R., and Vieu, P. (2008) Cross-validated estimations in the single-functional index model, Statistics, 42(6), 475–494, doi:10.1080/02331880801980377.

Novo S., Aneiros, G., and Vieu, P., (2019) Automatic and location-adaptive estimation in functional single–index regression, Journal of Nonparametric Statistics, 31(2), 364–392, doi:10.1080/10485252.2019.1567726.

See Also

See also fsim.kNN.test, predict.fsim.kNN, plot.fsim.kNN.

Alternative procedures fsim.kernel.fit, fsim.kNN.fit.optim and fsim.kernel.fit.optim

Examples


data(Tecator)
y<-Tecator$fat
X<-Tecator$absor.spectra2

#FSIM fit.
ptm<-proc.time()
fit<-fsim.kNN.fit(y=y[1:160],x=X[1:160,],max.knn=20,nknot.theta=4,nknot=20,
range.grid=c(850,1050))
proc.time()-ptm
fit
names(fit)



[Package fsemipar version 1.1.1 Index]