R: Regularied M-estimation for fitting generalized linear models...

glm.regu {RCAL}

R Documentation

Regularied M-estimation for fitting generalized linear models with a fixed tuning parameter

Description

This function implements regularized M-estimation for fitting generalized linear models with continuous or binary responses for a fixed choice of tuning parameters.

Usage

glm.regu(y, x, iw = NULL, loss = "cal", init = NULL, rhos, test = NULL,
  offs = NULL, id = NULL, Wmat = NULL, Rmat = NULL, zzs = NULL,
  xxs = NULL, n.iter = 100, eps = 1e-06, bt.lim = 3, nz.lab = NULL,
  pos = 10000)

Arguments

`y`	An `n` x `1` response vector.
`x`	An `n` x `p` matix of covariates, excluding a constant.
`iw`	An `n` x `1` weight vector.
`loss`	A loss function, which can be specified as "gaus" for continuous responses, or "ml" or "cal" for binary respones.
`init`	A `(p+1)` x `1` vector of initial values (the intercept and coefficients).
`rhos`	A `p` x `1` vector of Lasso tuning parameters, usually a constant vector, associated with the `p` coefficients.
`test`	A vector giving the indices of observations between 1 and `n` which are included in the test set.
`offs`	An `n` x `1` vector of offset values, similarly as in `glm`.
`id`	An argument which can be used to speed up computation.
`Wmat`	An argument which can be used to speed up computation.
`Rmat`	An argument which can be used to speed up computation.
`zzs`	An argument which can be used to speed up computation.
`xxs`	An argument which can be used to speed up computation.
`n.iter`	The maximum number of iterations allowed. An iteration is defined by computing an quadratic approximation and solving a least-squares Lasso problem.
`eps`	The tolerance at which the difference in the objective (loss plus penalty) values is considered close enough to 0 to declare convergence.
`bt.lim`	The maximum number of backtracking steps allowed.
`nz.lab`	A `p` x `1` logical vector (useful for simulations), indicating which covariates are included when calculating the number of nonzero coefficients. If `nz.lab=NULL`, then `nz.lab` is reset to a vector of 0s.
`pos`	A value which can be used to facilitate recording the numbers of nonzero coefficients with or without the restriction by `nz.lab`. If `nz.lab=NULL`, then `pos` is reset to 1.

Details

For continuous responses, this function uses an active-set descent algorithm (Osborne et al. 2000; Yang and Tan 2018) to solve the least-squares Lasso problem. For binary responses, regularized calibrated estimation is implemented using the Fisher scoring descent algorithm in Tan (2020), whereas regularized maximum likelihood estimation is implemented in a similar manner based on quadratic approximation as in the R package glmnet.

Value

`iter`	The number of iterations performed up to `n.iter`.
`conv`	1 if convergence is obtained, 0 if exceeding the maximum number of iterations, or -1 if exceeding maximum number of backtracking steps.
`nz`	A value defined as (nz0 * `pos` + nz1) to record the numbers of nonzero coefficients without or with the restriction (denoted as nz0 and nz1) by `nz.lab`. If `nz.lab=NULL`, then nz1 is 0, `pos` is 1, and hence `nz` is nz0.
`inter`	The estimated intercept.
`bet`	The `p` x `1` vector of estimated coefficients, excluding the intercept.
`fit`	The vector of fitted values in the training set.
`eta`	The vector of linear predictors in the training set.
`tau`	The `p` x `1` vector of generalized signs, which should be -1 or 1 for a negative or positive estimate and between -1 and 1 for a zero estimate.
`obj.train`	The average loss in the training set.
`pen`	The Lasso penalty of the estimates.
`obj`	The average loss plus the Lasso penalty.
`fit.test`	The vector of fitted values in the test set.
`eta.test`	The vector of linear predictors in the test set.
`obj.test`	The average loss in the test set.
`id`	This can be re-used to speed up computation.
`Wmat`	This can be re-used to speed up computation.
`Rmat`	This can be re-used to speed up computation.
`zzs`	This can be re-used to speed up computation.
`xxs`	This can be re-used to speed up computation.

References

Osborne, M., Presnell, B., and Turlach, B. (2000) A new approach to variable selection in least squares problems, IMA Journal of Numerical Analysis, 20, 389-404.

Yang, T. and Tan, Z. (2018) Backfitting algorithms for total-variation and empirical-norm penalized additive modeling with high-dimensional data, Stat, 7, e198.

Tibshirani, R. (1996) Regression shrinkage and selection via the Lasso, Journal of the Royal Statistical Society, Ser. B, 58, 267-288.

Tan, Z. (2020) Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data, Biometrika, 107, 137–158.

Examples

data(simu.data)
n <- dim(simu.data)[1]
p <- dim(simu.data)[2]-2

y <- simu.data[,1]
tr <- simu.data[,2]
x <- simu.data[,2+1:p]
x <- scale(x)

### Example 1: linear regression
# rhos should be a vector of length p, even though a constant vector
out.rgaus <- glm.regu(y[tr==1], x[tr==1,], rhos=rep(.05,p), loss="gaus")

# the intercept
out.rgaus$inter

# the estimated coefficients and generalized signs; the first 10 are shown
cbind(out.rgaus$bet, out.rgaus$tau)[1:10,]

# the number of nonzero coefficients 
out.rgaus$nz

### Example 2: logistic regression using likelihood loss
out.rml <- glm.regu(tr, x, rhos=rep(.01,p), loss="ml")
out.rml$inter
cbind(out.rml$bet, out.rml$tau)[1:10,]
out.rml$nz

### Example 3: logistic regression using calibration loss
out.rcal <- glm.regu(tr, x, rhos=rep(.05,p), loss="cal")
out.rcal$inter
cbind(out.rcal$bet, out.rcal$tau)[1:10,]
out.rcal$nz

[Package RCAL version 2.0 Index]