lm.pels.fit {fsemipar} | R Documentation |
Regularised fit of sparse linear regression
Description
This function fits a sparse linear model between a scalar response and a vector of scalar covariates. It employs a penalised least-squares regularisation procedure, with either (group)SCAD or (group)LASSO penalties. The method utilises an objective criterion (criterion
) to select the optimal regularisation parameter (lambda.opt
).
Usage
lm.pels.fit(z, y, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL,
factor.pn = 1, nlambda = 100, lambda.seq = NULL, vn = ncol(z), nfolds = 10,
seed = 123, criterion = "GCV", penalty = "grSCAD", max.iter = 1000)
Arguments
z |
Matrix containing the observations of the covariates collected by row. |
y |
Vector containing the scalar response. |
lambda.min |
The smallest value for lambda (i. e., the lower endpoint of the sequence in which |
lambda.min.h |
The lower endpoint of the sequence in which |
lambda.min.l |
The lower endpoint of the sequence in which |
factor.pn |
Positive integer used to set |
nlambda |
Positive integer indicating the number of values in the sequence from which |
lambda.seq |
Sequence of values in which |
vn |
Positive integer or vector of positive integers indicating the number of groups of consecutive variables to be penalised together. The default value is |
nfolds |
Number of cross-validation folds (used when |
seed |
You may set the seed for the random number generator to ensure reproducible results (applicable when |
criterion |
The criterion used to select the regularisation parameter |
penalty |
The penalty function applied in the penalised least-squares procedure. Currently, only "grLasso" and "grSCAD" are implemented. The default is "grSCAD". |
max.iter |
Maximum number of iterations allowed across the entire path. The default value is 1000. |
Details
The sparse linear model (SLM) is given by the expression:
Y_i=Z_{i1}\beta_{01}+\dots+Z_{ip_n}\beta_{0p_n}+\varepsilon_i\ \ \ i=1,\dots,n,
where Y_i
denotes a scalar response, Z_{i1},\dots,Z_{ip_n}
are real covariates. In this equation, \mathbf{\beta}_0=(\beta_{01},\dots,\beta_{0p_n})^{\top}
is a vector of unknown real parameters and \varepsilon_i
represents the random error.
In this function, the SLM is fitted using a penalised least-squares (PeLS) approach by minimising
\mathcal{Q}\left(\mathbf{\beta}\right)=\frac{1}{2}\left(\mathbf{Y}-\mathbf{Z}\mathbf{\beta}\right)^{\top}\left(\mathbf{Y}-\mathbf{Z}\mathbf{\beta}\right)+n\sum_{j=1}^{p_n}\mathcal{P}_{\lambda_{j_n}}\left(|\beta_j|\right), \quad (1)
where \mathbf{\beta}=(\beta_1,\ldots,\beta_{p_n})^{\top}, \ \mathcal{P}_{\lambda_{j_n}}\left(\cdot\right)
is a penalty function (specified in the argument penalty
) and \lambda_{j_n} > 0
is a tuning parameter.
To reduce the number of tuning parameters, \lambda_j
, to be selected for each sample, we consider \lambda_j = \lambda \widehat{\sigma}_{\beta_{0,j,OLS}}
, where \beta_{0,j,OLS}
denotes the OLS estimate of \beta_{0,j}
and \widehat{\sigma}_{\beta_{0,j,OLS}}
is the estimated standard deviation. The parameter \lambda
is selected using the objetive criterion specified in the argument criterion
.
For further details on the estimation procedure of the SLM, see e.g. Fan and Li. (2001). The PeLS objective function is minimised using the R function grpreg
of the package grpreg
(Breheny and Huang, 2015).
Remark: It should be noted that if we set lambda.seq
to =0
, we obtain the non-penalised estimation of the model, i.e. the OLS estimation. Using lambda.seq
with a vaule \not=0
is advisable when suspecting the presence of irrelevant variables.
Value
call |
The matched call. |
fitted.values |
Estimated scalar response. |
residuals |
Differences between |
beta.est |
Estimate of |
indexes.beta.nonnull |
Indexes of the non-zero |
lambda.opt |
Selected value of lambda. |
IC |
Value of the criterion function considered to select |
vn.opt |
Selected value of |
... |
Author(s)
German Aneiros Perez german.aneiros@udc.es
Silvia Novo Diaz snovo@est-econ.uc3m.es
References
Breheny, P., and Huang, J. (2015) Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and Computing, 25, 173–187, doi:10.1007/s11222-013-9424-2.
Fan, J., and Li, R. (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360, doi:10.1198/016214501753382273.
See Also
See also PVS.fit
.
Examples
data("Tecator")
y<-Tecator$fat
z1<-Tecator$protein
z2<-Tecator$moisture
#Quadratic, cubic and interaction effects of the scalar covariates.
z.com<-cbind(z1,z2,z1^2,z2^2,z1^3,z2^3,z1*z2)
train<-1:160
#LM fit
ptm=proc.time()
fit<-lm.pels.fit(z=z.com[train,], y=y[train],lambda.min.h=0.02,
lambda.min.l=0.01,factor.pn=2, max.iter=5000, criterion="BIC")
proc.time()-ptm
#Results
fit
names(fit)