R: Function to fit penalized generalized estimating equations

PGEE {PGEE}

R Documentation

Function to fit penalized generalized estimating equations

Description

This function fits a penalized generalized estimating equation model to longitudinal data.

Usage

PGEE(formula, id, data, na.action = NULL, family = gaussian(link = "identity"), 
corstr = "independence", Mv = NULL, beta_int = NULL, R = NULL, scale.fix = TRUE, 
scale.value = 1, lambda, pindex = NULL, eps = 10^-6, maxiter = 30, tol = 10^-3, 
silent = TRUE)

Arguments

`formula`	A formula expression in the form of `response ~ predictors`.
`id`	A vector for identifying subjects/clusters.
`data`	A data frame which stores the variables in `formula` with `id` variable.
`na.action`	A function to remove missing values from the data. Only `na.omit` is allowed here.
`family`	A `family` object: a list of functions and expressions for defining `link` and `variance` functions. Families supported in `PGEE` are `binomial`, `gaussian`, `gamma` and `poisson`. The `links`, which are not available in `gee`, is not available here. The default family is `gaussian`.
`corstr`	A character string, which specifies the type of correlation structure. Structures supported in `PGEE` are `"AR-1"`,`"exchangeable"`, `"fixed"`, `"independence"`, `"stat_M_dep"`,`"non_stat_M_dep"`, and `"unstructured"`. The default `corstr` type is `"independence"`.
`Mv`	If either `"stat_M_dep"`, or `"non_stat_M_dep"` is specified in `corstr`, then this assigns a numeric value for `Mv`. Otherwise, the default value is `NULL`.
`beta_int`	User specified initial values for regression parameters. The default value is `NULL`.
`R`	If `corstr = "fixed"` is specified, then `R` is a square matrix of dimension maximum cluster size containing the user specified correlation. Otherwise, the default value is `NULL`.
`scale.fix`	A logical variable; if true, the scale parameter is fixed at the value of `scale.value`. The default value is `TRUE`.
`scale.value`	If `scale.fix = TRUE`, this assignes a numeric value to which the scale parameter should be fixed. The default value is 1.
`lambda`	A numerical value for the penalization parameter of the scad function, which is estimated via cross-validation.
`pindex`	An index vector showing the parameters which are not subject to penalization. The default value is `NULL`. However, in case of a model with intercept, the intercept parameter should be never penalized.
`eps`	A numerical value for the epsilon used in minorization-maximization algorithm. The default value is `10^-6`.
`maxiter`	The number of iterations that is used in the estimation algorithm. The default value is `25`.
`tol`	The tolerance level that is used in the estimation algorithm. The default value is `10^-3`.
`silent`	A logical variable; if false, the regression parameter estimates at each iteration are printed. The default value is `TRUE`.

Value

An object class of PGEE representing the fit.

References

Wang, L., Zhou, J., and Qu, A. (2012). Penalized generalized estimating equations for high-dimensional longitudinal data analysis. Biometrics, 68, 353–360.

Examples

# Consider an example similar to example 1 
# in Wang et al. (2012).

# required R package 
library(mvtnorm)
# number of subjects
n <- 200
# number of covariates 
pn <- 10
# number of time points
m <- 4

# vector if subject ids
id.vect <- rep(1:n, each = m) 

# covariance matrix of (pn-1) number of continuous covariates 
X.sigma <- matrix(0,(pn-1),(pn-1))
{
for (i in 1:(pn-1))
X.sigma[i,] <- 0.5^(abs((1:(pn-1))-i))  
}

# generate matrix of covariates    
x.mat <- as.matrix(rmvnorm(n*m, mean = rep(0,(pn-1)), X.sigma))
x.mat <- cbind(rbinom(n*m,1, 0.5), x.mat)

# true values
beta.true <- c(2,3,1.5,2,rep(0,6))
sigma2 <- 1
rho <- 0.5
R <- matrix(rho,m,m)+diag(rep(1-rho,m))

# covariance matrix of error
SIGMA <- sigma2*R
error <- rmvnorm(n, mean = rep(0,m),SIGMA)

# generate longitudinal data with continuous outcomes
y.temp <- x.mat%*%beta.true
y.vect <- y.temp+as.vector(t(error))

mydata <- data.frame(id.vect,y.vect,x.mat) 
colnames(mydata) <- c("id","y",paste("x",1:length(beta.true),sep = ""))

###Input Arguments for CVfit fitting###
library(PGEE)
formula <- "y ~.-id-1"
data <- mydata
family <- gaussian(link = "identity")
lambda.vec <- seq(0.1,1,0.1)

## Not run: 
cv <- CVfit(formula = formula, id = id, data = data, family = family,
fold = 4, lambda.vec = lambda.vec, pindex = NULL, eps = 10^-6, maxiter = 30, 
tol = 10^-3)

names(cv)
cv$lam.opt

## End(Not run)

lambda <- 0.1 #this value obtained through CVfit

# analyze the data through penalized generalized estimating equations

myfit1 <- PGEE(formula = formula, id = id, data = data, na.action = NULL, 
family = family, corstr = "exchangeable", Mv = NULL, 
beta_int = c(rep(0,length(beta.true))), R = NULL, scale.fix = TRUE, 
scale.value = 1, lambda = lambda, pindex = NULL, eps = 10^-6, maxiter = 30, 
tol = 10^-3, silent = TRUE)

summary(myfit1)

# analyze the data through unpenalized generalized estimating equations

myfit2 <- MGEE(formula = formula, id = id, data = data, na.action = NULL, 
family = family, corstr = "exchangeable", Mv = NULL, 
beta_int = c(rep(0,length(beta.true))), R = NULL, scale.fix = TRUE, 
scale.value = 1, maxiter = 30, tol = 10^-3, silent = TRUE)

summary(myfit2)

[Package PGEE version 1.5 Index]