LassoGEE {LassoGEE}R Documentation

Function to fit penalized GEE by I-CGD algorithm.

Description

This function fits a L_1 penalized GEE model to longitudinal data by I-CGD algorithm or re-weighted least square algorithm.

Usage

LassoGEE(
  X,
  y,
  id,
  family = binomial("probit"),
  lambda,
  corstr = "independence",
  method = c("CGD", "RWL"),
  beta.ini = NULL,
  R = NULL,
  scale.fix = TRUE,
  scale.value = 1,
  maxiter = 50,
  tol = 0.001,
  silent = TRUE,
  Mv = NULL,
  verbose = TRUE
)

Arguments

X

A design matrix of dimension (nm) * p.

y

A response vector of length m * n.

id

A vector for identifying subjects/clusters.

family

A family object representing one of the built-in families. Families supported here are the same as in PGEE, e.g, binomial, gaussian, gamma and poisson, and the corresponding link functions are supported, e.g, identity, and probit.

lambda

A user supplied value for the penalization parameter.

corstr

A character string that indicates the correlation structure among the repeated measurements of a subject. Structures supported in LassoGEE are "AR1", "exchangeable", "unstructured", and "independence". The default corstr type is "independence".

method

The algorithms that are available. "CGD" represents the I-CGD algorithm, and "RWL" represents re-weighted least square algorithm.

beta.ini

User specified initial values for regression parameters. The default value is NULL.

R

User specified correlation matrix. The default value is NULL.

scale.fix

A logical variable. The default value is TRUE, then the value of the scale parameter is fixed to scale.value.

scale.value

If scale.fix = TRUE, a numeric value will be assigned to the fixed scale parameter. The default value is 1.

maxiter

The maximum number of iterations used in the algorithm. The default value is 50.

tol

The tolerance level used in the algorithm. The default value is 1e-3.

silent

A logical variable; if false, the iteration counts at each iteration of CGD are printed. The default value is TRUE.

Mv

If either "stat_M_dep", or "non_stat_M_dep" is specified in corstr, then this assigns a numeric value for Mv. Otherwise, the default value is NULL.

verbose

A logical variable; Print the out loop iteration counts. The default value is TRUE.

Value

A list containing the following components:

betaest

return final estimation

beta_all_step

return estimate in each iteration

inner.count

iterative count in each stage

outer.iter

iterate number of outer loop

References

Li, Y., Gao, X., and Xu, W. (2020). Statistical consistency for generalized estimating equation with L_1 regularization.

See Also

cv.LassoGEE

Examples

# required R package
library(mvtnorm)
library(SimCorMultRes)
#
set.seed(123)
p <- 200
s <- ceiling(p^{1/3})
n <- ceiling(10 * s * log(p))
m <- 4
# covariance matrix of p number of continuous covariates
X.sigma <- matrix(0, p, p)
{
  for (i in 1:p)
    X.sigma[i,] <- 0.5^(abs((1:p)-i))
}

# generate matrix of covariates
X <- as.matrix(rmvnorm(n*m, mean = rep(0,p), X.sigma))

# true regression parameter associated with the covariate
bt <- runif(s, 0.05, 0.5) # = rep(1/s,s)
beta.true <- c(bt,rep(0,p-s))
# intercept
beta_intercepts <- 0
# unstructure
tt <- runif(m*m,-1,1)
Rtmp <- t(matrix(tt, m,m))%*%matrix(tt, m,m)+diag(1,4)
R_tr <- diag(diag(Rtmp)^{-1/2})%*%Rtmp%*%diag(diag(Rtmp)^{-1/2})
diag(R_tr) = round(diag(R_tr))

# library(SimCorMultRes)
# simulation of clustered binary responses
simulated_binary_dataset <- rbin(clsize = m, intercepts = beta_intercepts,
                                 betas = beta.true, xformula = ~X, cor.matrix = R_tr,
                                 link = "probit")
lambda <- 0.2* s *sqrt(log(p)/n)
data = simulated_binary_dataset$simdata
y = data$y
X = data$X
id = data$id

ptm <- proc.time()
nCGDfit = LassoGEE(X = X, y = y, id = id, family = binomial("probit"),
                 lambda = lambda, corstr = "unstructured")
proc.time() - ptm
betaest <- nCGDfit$betaest


[Package LassoGEE version 1.0 Index]