R: Internal function for robust penalized generalized linear...

irglmreg_fit {mpath}

R Documentation

Internal function for robust penalized generalized linear models

Description

Fit a robust penalized GLM where the loss function is a composite function cfunodfun + penalty. This does computing for irglmreg.

Usage

irglmreg_fit(x, y, weights, offset, cfun="ccave", dfun="gaussian", s=NULL, 
             delta=0.1, fk=NULL, iter=10, reltol=1e-5, 
             penalty=c("enet","mnet","snet"), nlambda=100, lambda=NULL, 
             type.path=c("active", "nonactive"), decreasing=TRUE, 
             lambda.min.ratio=ifelse(nobs<nvars,.05, .001), alpha=1, gamma=3,
             rescale=TRUE, standardize=TRUE, intercept=TRUE, 
             penalty.factor= NULL, maxit=1000, type.init=c("bst", "co", "heu"), 
             init.family=NULL, mstop.init=10, nu.init=0.1, 
             eps=.Machine$double.eps, epscycle=10, thresh=1e-6, parallel=FALSE,
             n.cores=2, theta, trace=FALSE, tracelevel=1)

Arguments

`x`	input matrix, of dimension nobs x nvars; each row is an observation vector.
`y`	response variable. Quantitative for `dfun=1` and -1/1 otherwise for classifications.
`weights`	observation weights. Can be total counts if responses are proportion matrices. Default is 1 for each observation
`offset`	this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. Currently only one offset term can be included in the formula.
`cfun`	character, type of convex cap (concave) function. Valid options are: `"hcave"` `"acave"` `"bcave"` `"ccave"` `"dcave"` `"ecave"` `"gcave"` `"tcave"`
`dfun`	character, type of convex downward function. Valid options are: `"gaussian"` `"gaussianC"` `"binomial"`
`s`	tuning parameter of `cfun`. `s > 0` and can be equal to 0 for `cfun="tcave"`. If `s` is too close to 0 for `cfun="acave", "bcave", "ccave"`, the calculated weights can become 0 for all observations, thus crash the program.
`delta`	a small positive number provided by user only if `cfun="gcave"` and `0 < s <1`
`fk`	predicted values at an iteration in the IRCO algorithm
`nlambda`	The number of `lambda` values - default is 100. The sequence may be truncated before `nlambda` is reached if a close to saturated model is fitted. See also `satu`.
`lambda`	by default, the algorithm provides a sequence of regularization values, or a user supplied `lambda` sequence
`type.path`	solution path for `parallel=FALSE`. If `type.path="active"`, then cycle through only the active set in the next increasing `lambda` sequence. If `type.path="nonactive"`, no active set for each element of the lambda sequence and cycle through all the predictor variables.
`lambda.min.ratio`	Smallest value for `lambda`, as a fraction of `lambda.max`, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero except the intercept). Note, there is no closed formula for `lambda.max`. The default of `lambda.min.ratio` depends on the sample size `nobs` relative to the number of variables `nvars`. If `nobs > nvars`, the default is `0.001`, close to zero. If `nobs < nvars`, the default is `0.05`.
`alpha`	The `L_2` penalty mixing parameter, with `0 \le alpha\le 1`. `alpha=1` is lasso (mcp, scad) penalty; and `alpha=0` the ridge penalty. However, if `alpha=0`, one must provide `lambda` values.
`gamma`	The tuning parameter of the `snet` or `mnet` penalty.
`rescale`	logical value, if TRUE, adaptive rescaling of the penalty parameter for `penalty="mnet"` or `penalty="snet"` with `dfun="binomial"`. See `glmreg_fit`
`standardize`	logical value for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is `standardize=TRUE`.
`intercept`	logical value: if TRUE (default), intercept(s) are fitted; otherwise, intercept(s) are set to zero
`penalty.factor`	This is a number that multiplies `lambda` to allow differential shrinkage of coefficients. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model. Default is same shrinkage for all variables.
`type.init`	a method to determine the initial values. If `type.init="ncl"`, an intercept-only model as initial parameter and run `irglmreg` regularization path forward from lambda_max to lambda_min. If `type.init="heu"`, heuristic initial parameters and run `irglmreg` path backward or forward depending on `decreasing`, between lambda_min and lambda_max. If `type.init="bst"`, run a boosting model with `bst` in package bst, depending on `mstop.init, nu.init` and run `irglmreg` backward or forward depending on `decreasing`.
`init.family`	character value for initial family, one of "clossR", "closs","gloss","qloss", which can be used to derive an initial estimator, if the selection is different from the default value
`mstop.init`	an integer giving the number of boosting iterations when `type.init="bst"`
`nu.init`	a small number (between 0 and 1) defining the step size or shrinkage parameter when `type.init="bst"`.
`decreasing`	only used if `lambda=NULL`, a logical value used to determine regularization path direction either from lambda_max to a potentially modified lambda_min or vice versa if `type.init="bst", "heu"`. Since this is a nonconvex optimization, it is possible to generate different estimates for the same `lambda` depending on `decreasing`. The choice of `decreasing` picks different starting values.
`iter`	number of iteration in the IRCO algorithm
`maxit`	Within each IRCO algorithm iteration, maximum number of coordinate descent iterations for each `lambda` value; default is 1000.
`reltol`	convergency criteria in the IRCO algorithm
`eps`	If a coefficient is less than `eps` in magnitude, then it is reported to be 0
`epscycle`	If `nlambda` > 1 and the relative loss values from two consecutive `lambda` values change > `epscycle`, then re-estimate parameters in an effort to avoid trap of local optimization.
`thresh`	Convergence threshold for coordinate descent. Defaults value is `1e-6`.
`penalty`	Type of regularization
`theta`	an overdispersion scaling parameter for `family="negbin"`
`parallel`, `n.cores`	If `TRUE`, to compute solution of `lambda` with parallel computing in number of `n.cores`. If `FALSE`, sequential computing. If `NULL`, still sequential computing with a different convergence criteria based on penalized loss values
`trace`, `tracelevel`	If `TRUE`, fitting progress is reported. If `tracelevel=2`, deeper level of fitting progress is reported.

Details

A case weighted penalized least squares or GLM is fit by the iteratively reweighted convex optimization (IRCO), where the loss function is a composite function cfunodfun + penalty. Here convex is the loss function induced by dfun, not the penalty function. The sequence of robust models implied by lambda is fit by IRCO along with coordinate descent. Note that the objective function is

weights*loss + \lambda*penalty,

if standardize=FALSE and

\frac{weights}{\sum(weights)}*loss + \lambda*penalty,

if standardize=TRUE.

Value

An object with S3 class "irglmreg" for the various types of models.

`call`	the call that produced the model fit
`b0`	Intercept sequence of length `length(lambda)`
`beta`	A `nvars x length(lambda)` matrix of coefficients.
`lambda`	The actual sequence of `lambda` values used
`weights_update`	A `nobs x length(lambda)` matrix of weights computed by the IRCO algorithm. The entry of i-th row and j-th column is the weight for the i-th observation and j-th `lambda` value.
`decreasing`	if `lambda` is an increasing sequence or not, used to determine regularization path direction either from lambda_max to a potentially modified lambda_min or vice versa if `type.init="bst", "heu"`.

Author(s)

Zhu Wang <zwang145@uthsc.edu>

References

Zhu Wang (2024) Unified Robust Estimation, Australian & New Zealand Journal of Statistics. 66(1):77-102.