| lm.pels.fit {fsemipar} | R Documentation |
Regularised fit of sparse linear regression
Description
This function fits a sparse linear model between a scalar response and a vector of scalar covariates. It employs a penalised least-squares regularisation procedure, with either (group)SCAD or (group)LASSO penalties. The method utilises an objective criterion (criterion) to select the optimal regularisation parameter (lambda.opt).
Usage
lm.pels.fit(z, y, lambda.min = NULL, lambda.min.h = NULL, lambda.min.l = NULL,
factor.pn = 1, nlambda = 100, lambda.seq = NULL, vn = ncol(z), nfolds = 10,
seed = 123, criterion = "GCV", penalty = "grSCAD", max.iter = 1000)
Arguments
z |
Matrix containing the observations of the covariates collected by row. |
y |
Vector containing the scalar response. |
lambda.min |
The smallest value for lambda (i. e., the lower endpoint of the sequence in which |
lambda.min.h |
The lower endpoint of the sequence in which |
lambda.min.l |
The lower endpoint of the sequence in which |
factor.pn |
Positive integer used to set |
nlambda |
Positive integer indicating the number of values in the sequence from which |
lambda.seq |
Sequence of values in which |
vn |
Positive integer or vector of positive integers indicating the number of groups of consecutive variables to be penalised together. The default value is |
nfolds |
Number of cross-validation folds (used when |
seed |
You may set the seed for the random number generator to ensure reproducible results (applicable when |
criterion |
The criterion used to select the regularisation parameter |
penalty |
The penalty function applied in the penalised least-squares procedure. Currently, only "grLasso" and "grSCAD" are implemented. The default is "grSCAD". |
max.iter |
Maximum number of iterations allowed across the entire path. The default value is 1000. |
Details
The sparse linear model (SLM) is given by the expression:
Y_i=Z_{i1}\beta_{01}+\dots+Z_{ip_n}\beta_{0p_n}+\varepsilon_i\ \ \ i=1,\dots,n,
where Y_i denotes a scalar response, Z_{i1},\dots,Z_{ip_n} are real covariates. In this equation, \mathbf{\beta}_0=(\beta_{01},\dots,\beta_{0p_n})^{\top} is a vector of unknown real parameters and \varepsilon_i represents the random error.
In this function, the SLM is fitted using a penalised least-squares (PeLS) approach by minimising
\mathcal{Q}\left(\mathbf{\beta}\right)=\frac{1}{2}\left(\mathbf{Y}-\mathbf{Z}\mathbf{\beta}\right)^{\top}\left(\mathbf{Y}-\mathbf{Z}\mathbf{\beta}\right)+n\sum_{j=1}^{p_n}\mathcal{P}_{\lambda_{j_n}}\left(|\beta_j|\right), \quad (1)
where \mathbf{\beta}=(\beta_1,\ldots,\beta_{p_n})^{\top}, \ \mathcal{P}_{\lambda_{j_n}}\left(\cdot\right) is a penalty function (specified in the argument penalty) and \lambda_{j_n} > 0 is a tuning parameter.
To reduce the number of tuning parameters, \lambda_j, to be selected for each sample, we consider \lambda_j = \lambda \widehat{\sigma}_{\beta_{0,j,OLS}}, where \beta_{0,j,OLS} denotes the OLS estimate of \beta_{0,j} and \widehat{\sigma}_{\beta_{0,j,OLS}} is the estimated standard deviation. The parameter \lambda is selected using the objetive criterion specified in the argument criterion.
For further details on the estimation procedure of the SLM, see e.g. Fan and Li. (2001). The PeLS objective function is minimised using the R function grpreg of the package grpreg (Breheny and Huang, 2015).
Remark: It should be noted that if we set lambda.seq to =0, we obtain the non-penalised estimation of the model, i.e. the OLS estimation. Using lambda.seq with a vaule \not=0 is advisable when suspecting the presence of irrelevant variables.
Value
call |
The matched call. |
fitted.values |
Estimated scalar response. |
residuals |
Differences between |
beta.est |
Estimate of |
indexes.beta.nonnull |
Indexes of the non-zero |
lambda.opt |
Selected value of lambda. |
IC |
Value of the criterion function considered to select |
vn.opt |
Selected value of |
... |
Author(s)
German Aneiros Perez german.aneiros@udc.es
Silvia Novo Diaz snovo@est-econ.uc3m.es
References
Breheny, P., and Huang, J. (2015) Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and Computing, 25, 173–187, doi:10.1007/s11222-013-9424-2.
Fan, J., and Li, R. (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360, doi:10.1198/016214501753382273.
See Also
See also PVS.fit.
Examples
data("Tecator")
y<-Tecator$fat
z1<-Tecator$protein
z2<-Tecator$moisture
#Quadratic, cubic and interaction effects of the scalar covariates.
z.com<-cbind(z1,z2,z1^2,z2^2,z1^3,z2^3,z1*z2)
train<-1:160
#LM fit
ptm=proc.time()
fit<-lm.pels.fit(z=z.com[train,], y=y[train],lambda.min.h=0.02,
lambda.min.l=0.01,factor.pn=2, max.iter=5000, criterion="BIC")
proc.time()-ptm
#Results
fit
names(fit)