| cvdglars {dglars} | R Documentation |
Cross-Validation Method for dgLARS
Description
Uses the k-fold cross-validation deviance to estimate the solution point of the dgLARS solution curve.
Usage
cvdglars(formula, family = gaussian, g, unpenalized,
b_wght, data, subset, contrasts = NULL, control = list())
cvdglars.fit(X, y, family = gaussian, g, unpenalized,
b_wght, control = list())
Arguments
formula |
an object of class “ |
family |
a description of the error distribution and link
function used to specify the model. This can be a character string
naming a family function or the result of a call to a family function
(see |
g |
argument available only for |
unpenalized |
a vector used to specify the unpenalized estimators;
|
b_wght |
a vector, with length equal to the number of columns of
the matrix |
data |
an optional data frame, list or environment (or object coercible by ‘as.data.frame’ to a data frame) containing the variables in the model. If not found in ‘data’, the variables are taken from ‘environment(formula)’. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
contrasts |
an optional list. See the ‘contrasts.arg’ of ‘model.matrix.default’. |
control |
a list of control parameters. See ‘Details’. |
X |
design matrix of dimension |
y |
response vector. When the |
Details
cvdglars function runs dglars nfold+1 times.
The deviance is stored, and the average and its standard deviation
over the folds are computed.
cvdglars.fit is the workhorse function: it is more efficient
when the design matrix have already been calculated. For this reason
we suggest to use this function when the dgLARS method is applied in
a high-dimensional setting, i.e. when p>n.
The control argument is a list that can supply any of the following components:
algorithm:a string specifying the algorithm used to compute the solution curve. The predictor-corrector algorithm is used when
algorithm = ''pc''(default), while the cyclic coordinate d escent method is used settingalgorithm = ''ccd'';method:a string by means of to specify the kind of solution curve. If
method = ''dgLASSO''(default) the algorithm computes the solution curve defined by the differential geometric generalization of the LASSO estimator; otherwise, ifmethod = ''dgLARS'', the differential geometric generalization of the least angle regression method is used;nfold:a non negative integer used to specify the number of folds. Although
nfoldscan be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Default isnfold = 10;foldida
n-dimensional vector of integers, between 1 andn, used to define the folds for the cross-validation. By defaultfoldidis randomly generated;ng:number of values of the tuning parameter used to compute the cross-validation deviance. Default is
ng = 100;nv:control parameter for the
pcalgorithm. An integer value belonging to the interval[1;min(n,p)](default isnv = min(n-1,p)) used to specify the maximum number of variables included in the final model;np:control parameter for the
pc/ccdalgorithm. A non negative integer used to define the maximum number of points of the solution curve. For the predictor-corrector algorithmnpis set to50 \cdot min(n-1,p)(default), while for the cyclic coordinate descent method is set to 100 (default), i.e. the number of values of the tuning parameter\gamma;g0:control parameter for the
pc/ccdalgorithm. Set the smallest value for the tuning parameter\gamma. Default isg0 = ifelse(p<n, 1.0e-06, 0.05);dg_max:control parameter for the
pcalgorithm. A non negative value used to specify the maximum length of the step size. Settingdg_max = 0(default) the predictor-corrector algorithm uses the optimal step size (see Augugliaro et al. (2013) for more details) to approximate the value of the tuning parameter corresponding to the inclusion/exclusion of a variable from the model;nNR:control parameter for the
pcalgorithm. A non negative integer used to specify the maximum number of iterations of the Newton-Raphson algorithm used in the corrector step. Default isnNR = 200;NReps:control parameter for the
pcalgorithm. A non negative value used to define the convergence criterion of the Newton-Raphson algorithm. Default isNReps = 1.0e-06;ncrct:control parameter for the
pcalgorithm. When the Newton-Raphson algorithm does not converge, the step size (d\gamma) is reduced byd\gamma = cf \cdot d\gammaand the corrector step is repeated.ncrctis a non negative integer used to specify the maximum number of trials for the corrector step. Default isncrct = 50;cf:control parameter for the
pcalgorithm. The contractor factor is a real value belonging to the interval[0,1]used to reduce the step size as previously described. Default iscf = 0.5;nccd:control parameter for the
ccdalgorithm. A non negative integer used to specify the maximum number for steps of the cyclic coordinate descent algorithm. Default is1.0e+05.epscontrol parameter for the
pc/ccdalgorithm. The meaning of this parameter is related to the algorithm used to estimate the solution curve:i.if
algorithm = ''pc''thenepsis useda.to identify a variable that will be included in the active set (absolute value of the corresponding Rao's score test statistic belongs to
[\gamma - \code{eps}, \gamma + \code{eps}]);b.to establish if the corrector step must be repeated;
c.to define the convergence of the algorithm, i.e., the actual value of the tuning parameter belongs to the interval
[\code{g0 - eps},\code{g0 + eps}];
ii.if
algorithm = ''ccd''thenepsis used to define the convergence for a single solution point, i.e., each inner coordinate-descent loop continues until the maximum change in the Rao's score test statistic, after any coefficient update, is less thaneps.
Default is
eps = 1.0e-05.
Value
cvdglars returns an object with S3 class “cvdglars”, i.e. a list
containing the following components:
call |
the call that produced this object; |
formula_cv |
if the model is fitted by |
family |
a description of the error distribution used in the model; |
var_cv |
a character vector with the name of variables selected by cross-validation; |
beta |
the vector of the coefficients estimated by cross-validation; |
phi |
the cross-validation estimate of the disperion parameter; |
dev_m |
a vector of length |
dev_v |
a vector of length |
g |
the value of the tuning parameter corresponding to the minimum of the cross-validation deviance; |
g0 |
the smallest value for the tuning parameter; |
g_max |
the value of the tuning parameter corresponding to the starting point of the dgLARS solution curve; |
X |
the used design matrix; |
y |
the used response vector; |
w |
the vector of weights used to compute the adaptive dglars method; |
conv |
an integer value used to encode the warnings and the errors related to the algorithm used to fit the dgLARS solution curve. The values returned are:
|
control |
the list of control parameters used to compute the cross-validation deviance. |
Author(s)
Luigi Augugliaro
Maintainer: Luigi Augugliaro luigi.augugliaro@unipa.it
References
Augugliaro L., Mineo A.M. and Wit E.C. (2014) <doi:10.18637/jss.v059.i08> dglars: An R Package to Estimate Sparse Generalized Linear Models, Journal of Statistical Software, Vol 59(8), 1-40. https://www.jstatsoft.org/v59/i08/.
Augugliaro L., Mineo A.M. and Wit E.C. (2013) <doi:10.1111/rssb.12000> dgLARS: a differential geometric approach to sparse generalized linear models, Journal of the Royal Statistical Society. Series B., Vol 75(3), 471-498.
See Also
coef.cvdglars, print.cvdglars, plot.cvdglars methods
Examples
###########################
# Logistic regression model
# y ~ Binomial
set.seed(123)
n <- 100
p <- 100
X <- matrix(rnorm(n * p), n, p)
b <- 1:2
eta <- b[1] + X[, 1] * b[2]
mu <- binomial()$linkinv(eta)
y <- rbinom(n, 1, mu)
fit_cv <- cvdglars.fit(X, y, family = binomial)
fit_cv