R: Variable Selection using Functional Additive Models

fregre.gsam.vs {fda.usc}

R Documentation

Variable Selection using Functional Additive Models

Description

Computes functional GAM model between functional covariates (X^1(t_1),\cdots,X^{q}(t_q)) and non functional covariates (Z^1,...,Z^p) with a scalar response Y.

Usage

fregre.gsam.vs(
  data = list(),
  y,
  include = "all",
  exclude = "none",
  family = gaussian(),
  weights = NULL,
  basis.x = NULL,
  numbasis.opt = FALSE,
  kbs,
  dcor.min = 0.1,
  alpha = 0.05,
  par.model,
  xydist,
  trace = FALSE
)

Arguments

`data`	List that containing the variables in the model. "df" element is a data.frame containing the response and scalar covariates (numeric and factors variables are allowed). Functional covariates of class `fdata` or `fd` are included as named components in the `data` list.
`y`	Caracter string with the name of the scalar response variable.
`include`	vector with the name of variables to use. By default `"all"`, all variables are used.
`exclude`	vector with the name of variables to not use. By default `"none"`, no variable is deleted.
`family`	a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See `family` for details of family functions.)
`weights`	weights
`basis.x`	Basis parameter options `list` (recomended) List of basis for functional covariates, see same argument in `fregre.glm`. By default, the function uses a basis of 3 PC to represent each functional covariate. `vector` (by default) Vector with two parameters: Type of basis. By default `basis.x[1]="pc"`, principal component basis is used for each functional covariate included in the model. Other options `"pls"` and `"bspline"`. Maximum number of basis elements `numbasis` to be used. By default, `basis.x[2]=3`.
`numbasis.opt`	Logical, if `FALSE` by default, for each functional covariate included in the model, the function uses all basis elements. Otherwise, the function selects the significant coefficients.
`kbs`	The dimension of the basis used to represent the smooth term. The default depends on the number of variables that the smooth is a function of.
`dcor.min`	Threshold for a variable to be entered into the model. X is discarded if the distance correlation `R(X,e)< dcor.min` (e is the residual of previous steps).
`alpha`	Alpha value for testing the independence among covariate X and residual e in previous steps. By default is `0.05`.
`par.model`	Model parameters.
`xydist`	List with the inner distance matrices of each variable (all potential covariates and the response).
`trace`	Interactive Tracing and Debugging of Call.

Details

This function is an extension of the functional generalized spectral additive regression models: fregre.gsam where the E[Y|X,Z] is related to the linear prediction \eta via a link function g(\cdot) with integrated smoothness estimation by the smooth functions f(\cdot).

E[Y|X,Z])=\eta=g^{-1}(\alpha+\sum_{i=1}^{p}f_{i}(Z^{i})+\sum_{k=1}^{q}\sum_{j=1}^{k_q}{f_{j}^{k}(\xi_j^k)})

where \xi_j^k is the coefficient of the basis function expansion of X^k, (in PCA analysis \xi_j^k is the score of the j-functional PC of X^k.

The smooth functions f(\cdot) can be added to the right hand side of the formula to specify that the linear predictor depends on smooth functions of predictors using smooth terms s and te as in gam (or linear functionals of these as Z\beta and \big<X(t),\beta\big> in fregre.glm).

Value

Return an object corresponding to the estimated additive mdoel using the selected variables (ame output as thefregre.gsam function) and the following elements:

gof, the goodness of fit for each step of VS algorithm.
i.predictor, vector with 1 if the variable is selected, 0 otherwise.
ipredictor, vector with the name of selected variables (in order of selection)
dcor, the value of distance correlation for each potential covariate and the residual of the model in each step.

Note

If the formula only contains a non functional explanatory variables (multivariate covariates), the function compute a standard gam procedure.

Author(s)

Manuel Feb-Bande, Manuel Oviedo de la Fuente manuel.oviedo@udc.es

References

Febrero-Bande, M., Gonz\'alez-Manteiga, W. and Oviedo de la Fuente, M. Variable selection in functional additive regression models, (2018). Computational Statistics, 1-19. DOI: doi:10.1007/s00180-018-0844-5

Examples

## Not run:  
data(tecator)
x=tecator$absorp.fdata
x1 <- fdata.deriv(x)
x2 <- fdata.deriv(x,nderiv=2)
y=tecator$y$Fat
xcat0 <- cut(rnorm(length(y)),4) 
xcat1 <- cut(tecator$y$Protein,4)
xcat2 <- cut(tecator$y$Water,4)
ind <- 1:165
dat <- data.frame("Fat"=y, x1$data, xcat1, xcat2)
ldat <- ldata("df"=dat[ind,],"x"=x[ind,],"x1"=x1[ind,],"x2"=x2[ind,])
# 3 functionals (x,x1,x2), 3 factors (xcat0, xcat1, xcat2)
# and 100 scalars (impact poitns of x1) 

# Time consuming
res.gam0 <- fregre.gsam.vs(data=ldat,y="Fat"
            ,exclude="x2",numbasis.opt=T) # All the covariates
summary(res.gam0)
res.gam0$ipredictors

res.gam1 <- fregre.gsam.vs(data=ldat,y="Fat") # All the covariates
summary(res.gam1)
res.gam1$ipredictors

covar <- c("xcat0","xcat1","xcat2","x","x1","x2")
res.gam2 <- fregre.gsam.vs(data=ldat, y="Fat", include=covar)
summary(res.gam2)
res.gam2$ipredictors 
res.gam2$i.predictor

res.gam3 <- fregre.gsam.vs(data=ldat,y="Fat",
            basis.x=c("type.basis"="pc","numbasis"=10))
summary(res.gam3)
res.gam3$ipredictors

res.gam4 <- fregre.gsam.vs(data=ldat,y="Fat",include=c("x","x1","x2"),
basis.x=c("type.basis"="pc","numbasis"=5),numbasis.opt=T)
summary(res.gam4)
res.gam4$ipredictors
lpc <- list("x"=create.pc.basis(ldat$x,1:4)
           ,"x1"=create.pc.basis(ldat$x1,1:3)
           ,"x2"=create.pc.basis(ldat$x2,1:12))
res.gam5 <- fregre.gsam.vs(data=ldat,y="Fat",basis.x=lpc)
summary(res.gam5)
res.gam6 <- fregre.gsam.vs(data=ldat,y="Fat",basis.x=lpc,numbasis.opt=T)
summary(res.gam6)
bsp <- create.fourier.basis(ldat$x$rangeval,7)
lbsp <- list("x"=bsp,"x1"=bsp,"x2"=bsp)
res.gam7 <- fregre.gsam.vs(data=ldat,y="Fat",basis.x=lbsp,kbs=4)
summary(res.gam7)
# Prediction like fregre.gsam() 
newldat <- ldata("df"=dat[-ind,],"x"=x[-ind,],"x1"=x1[-ind,],
                "x2"=x2[-ind,])
pred.gam1 <- predict(res.gam1,newldat)
pred.gam2 <- predict(res.gam2,newldat)
pred.gam3 <- predict(res.gam3,newldat)
pred.gam4 <- predict(res.gam4,newldat)
pred.gam5 <- predict(res.gam5,newldat)
pred.gam6 <- predict(res.gam6,newldat)
pred.gam7 <- predict(res.gam7,newldat)
plot(dat[-ind,"Fat"],pred.gam1)
points(dat[-ind,"Fat"],pred.gam2,col=2)
points(dat[-ind,"Fat"],pred.gam3,col=3)
points(dat[-ind,"Fat"],pred.gam4,col=4)
points(dat[-ind,"Fat"],pred.gam5,col=5)
points(dat[-ind,"Fat"],pred.gam6,col=6)
points(dat[-ind,"Fat"],pred.gam7,col=7)
pred2meas(newldat$df$Fat,pred.gam1)
pred2meas(newldat$df$Fat,pred.gam2)
pred2meas(newldat$df$Fat,pred.gam3)
pred2meas(newldat$df$Fat,pred.gam4)
pred2meas(newldat$df$Fat,pred.gam5)
pred2meas(newldat$df$Fat,pred.gam6)
pred2meas(newldat$df$Fat,pred.gam7)

## End(Not run)

[Package fda.usc version 2.1.0 Index]