cvdglars {dglars} | R Documentation |
Cross-Validation Method for dgLARS
Description
Uses the k
-fold cross-validation deviance to estimate the solution point of the dgLARS solution curve.
Usage
cvdglars(formula, family = gaussian, g, unpenalized,
b_wght, data, subset, contrasts = NULL, control = list())
cvdglars.fit(X, y, family = gaussian, g, unpenalized,
b_wght, control = list())
Arguments
formula |
an object of class “ |
family |
a description of the error distribution and link
function used to specify the model. This can be a character string
naming a family function or the result of a call to a family function
(see |
g |
argument available only for |
unpenalized |
a vector used to specify the unpenalized estimators;
|
b_wght |
a vector, with length equal to the number of columns of
the matrix |
data |
an optional data frame, list or environment (or object coercible by ‘as.data.frame’ to a data frame) containing the variables in the model. If not found in ‘data’, the variables are taken from ‘environment(formula)’. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
contrasts |
an optional list. See the ‘contrasts.arg’ of ‘model.matrix.default’. |
control |
a list of control parameters. See ‘Details’. |
X |
design matrix of dimension |
y |
response vector. When the |
Details
cvdglars
function runs dglars
nfold
+1 times.
The deviance is stored, and the average and its standard deviation
over the folds are computed.
cvdglars.fit
is the workhorse function: it is more efficient
when the design matrix have already been calculated. For this reason
we suggest to use this function when the dgLARS method is applied in
a high-dimensional setting, i.e. when p>n
.
The control
argument is a list that can supply any of the following components:
algorithm
:a string specifying the algorithm used to compute the solution curve. The predictor-corrector algorithm is used when
algorithm = ''pc''
(default), while the cyclic coordinate d escent method is used settingalgorithm = ''ccd''
;method
:a string by means of to specify the kind of solution curve. If
method = ''dgLASSO''
(default) the algorithm computes the solution curve defined by the differential geometric generalization of the LASSO estimator; otherwise, ifmethod = ''dgLARS''
, the differential geometric generalization of the least angle regression method is used;nfold
:a non negative integer used to specify the number of folds. Although
nfolds
can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Default isnfold = 10
;foldid
a
n
-dimensional vector of integers, between 1 andn
, used to define the folds for the cross-validation. By defaultfoldid
is randomly generated;ng
:number of values of the tuning parameter used to compute the cross-validation deviance. Default is
ng = 100
;nv
:control parameter for the
pc
algorithm. An integer value belonging to the interval[1;min(n,p)]
(default isnv = min(n-1,p)
) used to specify the maximum number of variables included in the final model;np
:control parameter for the
pc/ccd
algorithm. A non negative integer used to define the maximum number of points of the solution curve. For the predictor-corrector algorithmnp
is set to50 \cdot min(n-1,p)
(default), while for the cyclic coordinate descent method is set to 100 (default), i.e. the number of values of the tuning parameter\gamma
;g0
:control parameter for the
pc/ccd
algorithm. Set the smallest value for the tuning parameter\gamma
. Default isg0 = ifelse(p<n, 1.0e-06, 0.05)
;dg_max
:control parameter for the
pc
algorithm. A non negative value used to specify the maximum length of the step size. Settingdg_max = 0
(default) the predictor-corrector algorithm uses the optimal step size (see Augugliaro et al. (2013) for more details) to approximate the value of the tuning parameter corresponding to the inclusion/exclusion of a variable from the model;nNR
:control parameter for the
pc
algorithm. A non negative integer used to specify the maximum number of iterations of the Newton-Raphson algorithm used in the corrector step. Default isnNR = 200
;NReps
:control parameter for the
pc
algorithm. A non negative value used to define the convergence criterion of the Newton-Raphson algorithm. Default isNReps = 1.0e-06
;ncrct
:control parameter for the
pc
algorithm. When the Newton-Raphson algorithm does not converge, the step size (d\gamma
) is reduced byd\gamma = cf \cdot d\gamma
and the corrector step is repeated.ncrct
is a non negative integer used to specify the maximum number of trials for the corrector step. Default isncrct = 50
;cf
:control parameter for the
pc
algorithm. The contractor factor is a real value belonging to the interval[0,1]
used to reduce the step size as previously described. Default iscf = 0.5
;nccd
:control parameter for the
ccd
algorithm. A non negative integer used to specify the maximum number for steps of the cyclic coordinate descent algorithm. Default is1.0e+05
.eps
control parameter for the
pc/ccd
algorithm. The meaning of this parameter is related to the algorithm used to estimate the solution curve:i.
if
algorithm = ''pc''
theneps
is useda.
to identify a variable that will be included in the active set (absolute value of the corresponding Rao's score test statistic belongs to
[\gamma - \code{eps}, \gamma + \code{eps}]
);b.
to establish if the corrector step must be repeated;
c.
to define the convergence of the algorithm, i.e., the actual value of the tuning parameter belongs to the interval
[\code{g0 - eps},\code{g0 + eps}]
;
ii.
if
algorithm = ''ccd''
theneps
is used to define the convergence for a single solution point, i.e., each inner coordinate-descent loop continues until the maximum change in the Rao's score test statistic, after any coefficient update, is less thaneps
.
Default is
eps = 1.0e-05.
Value
cvdglars
returns an object with S3 class “cvdglars
”, i.e. a list
containing the following components:
call |
the call that produced this object; |
formula_cv |
if the model is fitted by |
family |
a description of the error distribution used in the model; |
var_cv |
a character vector with the name of variables selected by cross-validation; |
beta |
the vector of the coefficients estimated by cross-validation; |
phi |
the cross-validation estimate of the disperion parameter; |
dev_m |
a vector of length |
dev_v |
a vector of length |
g |
the value of the tuning parameter corresponding to the minimum of the cross-validation deviance; |
g0 |
the smallest value for the tuning parameter; |
g_max |
the value of the tuning parameter corresponding to the starting point of the dgLARS solution curve; |
X |
the used design matrix; |
y |
the used response vector; |
w |
the vector of weights used to compute the adaptive dglars method; |
conv |
an integer value used to encode the warnings and the errors related to the algorithm used to fit the dgLARS solution curve. The values returned are:
|
control |
the list of control parameters used to compute the cross-validation deviance. |
Author(s)
Luigi Augugliaro
Maintainer: Luigi Augugliaro luigi.augugliaro@unipa.it
References
Augugliaro L., Mineo A.M. and Wit E.C. (2014) <doi:10.18637/jss.v059.i08> dglars: An R Package to Estimate Sparse Generalized Linear Models, Journal of Statistical Software, Vol 59(8), 1-40. https://www.jstatsoft.org/v59/i08/.
Augugliaro L., Mineo A.M. and Wit E.C. (2013) <doi:10.1111/rssb.12000> dgLARS: a differential geometric approach to sparse generalized linear models, Journal of the Royal Statistical Society. Series B., Vol 75(3), 471-498.
See Also
coef.cvdglars
, print.cvdglars
, plot.cvdglars
methods
Examples
###########################
# Logistic regression model
# y ~ Binomial
set.seed(123)
n <- 100
p <- 100
X <- matrix(rnorm(n * p), n, p)
b <- 1:2
eta <- b[1] + X[, 1] * b[2]
mu <- binomial()$linkinv(eta)
y <- rbinom(n, 1, mu)
fit_cv <- cvdglars.fit(X, y, family = binomial)
fit_cv