stdGee {stdReg} | R Documentation |
Regression standardization in conditional generalized estimating equations
Description
stdGee
performs regression standardization in linear and log-linear
fixed effects models, at specified values of the exposure, over the sample
covariate distribution. Let Y
, X
, and Z
be the outcome,
the exposure, and a vector of covariates, respectively. It is assumed that data
are clustered with a cluster indicator i
. stdGee
uses
fitted fixed effects model, with cluster-specific intercept a_i
(see details
), to estimate the standardized mean
\theta(x)=E\{E(Y|i,X=x,Z)\}
, where x
is a specific value of X
,
and the outer expectation is over the marginal distribution of (a_i,Z)
.
Usage
stdGee(fit, data, X, x, clusterid, subsetnew)
Arguments
fit |
an object of class |
data |
a data frame containing the variables in the model. This should be the same
data frame as was used to fit the model in |
X |
a string containing the name of the exposure variable |
x |
an optional vector containing the specific values of |
clusterid |
an mandatory string containing the name of a cluster identification variable. Must be identical to the clusterid variable used in the gee call. |
subsetnew |
an optional logical statement specifying a subset of observations to be used in the standardization. This set is assumed to be a subset of the subset (if any) that was used to fit the regression model. |
Details
stdGee
assumes that a fixed effects model
\eta\{E(Y|i,X,Z)\}=a_i+h(X,Z;\beta)
has been fitted. The link function \eta
is assumed to be the identity link
or the log link. The conditional generalized estimating equation (CGGE)
estimate of \beta
is used to obtain estimates of the cluster-specific
means:
\hat{a}_i=\sum_{j=1}^{n_i}r_{ij}/n_i,
where
r_{ij}=Y_{ij}-h(X_{ij},Z_{ij};\hat{\beta})
if \eta
is the identity link, and
r_{ij}=Y_{ij}exp\{-h(X_{ij},Z_{ij};\hat{\beta})\}
if \eta
is the log link, and (X_{ij},Z_{ij})
is the value of
(X,Z)
for subject j
in cluster i
, j=1,...,n_i
,
i=1,...,n
. The CGEE estimate of \beta
and the estimate of
a_i
are used to estimate the mean E(Y|i,X=x,Z)
:
\hat{E}(Y|i,X=x,Z)=\eta^{-1}\{\hat{a}_i+h(X=x,Z;\hat{\beta})\}.
For each x
in the x
argument, these estimates are averaged across
all subjects (i.e. all observed values of Z
and all estimated values of
a_i
) to produce estimates
\hat{\theta}(x)=\sum_{i=1}^n \sum_{j=1}^{n_i} \hat{E}(Y|i,X=x,Z_i)/N,
where N=\sum_{i=1}^n n_i
. The variance for \hat{\theta}(x)
is
obtained by the sandwich formula.
Value
An object of class "stdGee"
is a list containing
call |
the matched call. |
input |
|
est |
a vector with length equal to |
vcov |
a square matrix with |
Note
The variance calculation performed by stdGee
does not condition on
the observed covariates \bar{Z}=(Z_{11},...,Z_{nn_i})
. To see how this
matters, note that
var\{\hat{\theta}(x)\}=E[var\{\hat{\theta}(x)|\bar{Z}\}]+var[E\{\hat{\theta}(x)|\bar{Z}\}].
The usual parameter \beta
in a generalized linear model does not depend
on \bar{Z}
. Thus, E(\hat{\beta}|\bar{Z})
is
independent of \bar{Z}
as well (since E(\hat{\beta}|\bar{Z})=\beta
),
so that the term var[E\{\hat{\beta}|\bar{Z}\}]
in the corresponding
variance decomposition for var(\hat{\beta})
becomes equal to 0. However,
\theta(x)
depends on \bar{Z}
through the average over the sample
distribution for Z
, and thus the term var[E\{\hat{\theta}(x)|\bar{Z}\}]
is not 0, unless one conditions on \bar{Z}
.
Author(s)
Arvid Sjolander.
References
Goetgeluk S. and Vansteelandt S. (2008). Conditional generalized estimating equations for the analysis of clustered and longitudinal data. Biometrics 64(3), 772-780.
Martin R.S. (2017). Estimation of average marginal effects in multiplicative unobserved effects panel models. Economics Letters 160, 16-19.
Sjolander A. (2019). Estimation of marginal causal effects in the presence of confounding by cluster. Biostatistics doi: 10.1093/biostatistics/kxz054
Examples
require(drgee)
n <- 1000
ni <- 2
id <- rep(1:n, each=ni)
ai <- rep(rnorm(n), each=ni)
Z <- rnorm(n*ni)
X <- rnorm(n*ni, mean=ai+Z)
Y <- rnorm(n*ni, mean=ai+X+Z+0.1*X^2)
dd <- data.frame(id, Z, X, Y)
fit <- gee(formula=Y~X+Z+I(X^2), data=dd, clusterid="id", link="identity",
cond=TRUE)
fit.std <- stdGee(fit=fit, data=dd, X="X", x=seq(-3,3,0.5), clusterid="id")
print(summary(fit.std, contrast="difference", reference=2))
plot(fit.std)