stdGee {stdReg}R Documentation

Regression standardization in conditional generalized estimating equations


stdGee performs regression standardization in linear and log-linear fixed effects models, at specified values of the exposure, over the sample covariate distribution. Let YY, XX, and ZZ be the outcome, the exposure, and a vector of covariates, respectively. It is assumed that data are clustered with a cluster indicator ii. stdGee uses fitted fixed effects model, with cluster-specific intercept aia_i (see details), to estimate the standardized mean θ(x)=E{E(Yi,X=x,Z)}\theta(x)=E\{E(Y|i,X=x,Z)\}, where xx is a specific value of XX, and the outer expectation is over the marginal distribution of (ai,Z)(a_i,Z).


stdGee(fit, data, X, x, clusterid, subsetnew)



an object of class "gee", with argument cond = TRUE, as returned by the gee function in the drgee package. If arguments weights and/or subset are used when fitting the model, then the same weights and subset are used in stdGee.


a data frame containing the variables in the model. This should be the same data frame as was used to fit the model in fit.


a string containing the name of the exposure variable XX in data.


an optional vector containing the specific values of XX at which to estimate the standardized mean. If XX is binary (0/1) or a factor, then x defaults to all values of XX. If XX is numeric, then x defaults to the mean of XX. If x is set to NA, then XX is not altered. This produces an estimate of the marginal mean E(Y)=E{E(YX,Z)}E(Y)=E\{E(Y|X,Z)\}.


an mandatory string containing the name of a cluster identification variable. Must be identical to the clusterid variable used in the gee call.


an optional logical statement specifying a subset of observations to be used in the standardization. This set is assumed to be a subset of the subset (if any) that was used to fit the regression model.


stdGee assumes that a fixed effects model


has been fitted. The link function η\eta is assumed to be the identity link or the log link. The conditional generalized estimating equation (CGGE) estimate of β\beta is used to obtain estimates of the cluster-specific means:




if η\eta is the identity link, and


if η\eta is the log link, and (Xij,Zij)(X_{ij},Z_{ij}) is the value of (X,Z)(X,Z) for subject jj in cluster ii, j=1,...,nij=1,...,n_i, i=1,...,ni=1,...,n. The CGEE estimate of β\beta and the estimate of aia_i are used to estimate the mean E(Yi,X=x,Z)E(Y|i,X=x,Z):


For each xx in the x argument, these estimates are averaged across all subjects (i.e. all observed values of ZZ and all estimated values of aia_i) to produce estimates

θ^(x)=i=1nj=1niE^(Yi,X=x,Zi)/N,\hat{\theta}(x)=\sum_{i=1}^n \sum_{j=1}^{n_i} \hat{E}(Y|i,X=x,Z_i)/N,

where N=i=1nniN=\sum_{i=1}^n n_i. The variance for θ^(x)\hat{\theta}(x) is obtained by the sandwich formula.


An object of class "stdGee" is a list containing


the matched call.


input is a list containing all input arguments.


a vector with length equal to length(x), where element j is equal to θ^\hat{\theta}(x[j]).


a square matrix with length(x) rows, where the element on row i and column j is the (estimated) covariance of θ^\hat{\theta}(x[i]) and θ^\hat{\theta}(x[j]).


The variance calculation performed by stdGee does not condition on the observed covariates Zˉ=(Z11,...,Znni)\bar{Z}=(Z_{11},...,Z_{nn_i}). To see how this matters, note that


The usual parameter β\beta in a generalized linear model does not depend on Zˉ\bar{Z}. Thus, E(β^Zˉ)E(\hat{\beta}|\bar{Z}) is independent of Zˉ\bar{Z} as well (since E(β^Zˉ)=βE(\hat{\beta}|\bar{Z})=\beta), so that the term var[E{β^Zˉ}]var[E\{\hat{\beta}|\bar{Z}\}] in the corresponding variance decomposition for var(β^)var(\hat{\beta}) becomes equal to 0. However, θ(x)\theta(x) depends on Zˉ\bar{Z} through the average over the sample distribution for ZZ, and thus the term var[E{θ^(x)Zˉ}]var[E\{\hat{\theta}(x)|\bar{Z}\}] is not 0, unless one conditions on Zˉ\bar{Z}.


Arvid Sjolander.


Goetgeluk S. and Vansteelandt S. (2008). Conditional generalized estimating equations for the analysis of clustered and longitudinal data. Biometrics 64(3), 772-780.

Martin R.S. (2017). Estimation of average marginal effects in multiplicative unobserved effects panel models. Economics Letters 160, 16-19.

Sjolander A. (2019). Estimation of marginal causal effects in the presence of confounding by cluster. Biostatistics doi: 10.1093/biostatistics/kxz054



n <- 1000
ni <- 2
id <- rep(1:n, each=ni)
ai <- rep(rnorm(n), each=ni)
Z <- rnorm(n*ni)
X <- rnorm(n*ni, mean=ai+Z)
Y <- rnorm(n*ni, mean=ai+X+Z+0.1*X^2)
dd <- data.frame(id, Z, X, Y)
fit <- gee(formula=Y~X+Z+I(X^2), data=dd, clusterid="id", link="identity",
fit.std <- stdGee(fit=fit, data=dd, X="X", x=seq(-3,3,0.5), clusterid="id")
print(summary(fit.std, contrast="difference", reference=2))

[Package stdReg version 3.4.1 Index]