stdGlm {stdReg} | R Documentation |
Regression standardization in generalized linear models
Description
stdGlm
performs regression standardization in generalized linear models,
at specified values of the exposure, over the sample covariate distribution.
Let Y
, X
, and Z
be the outcome, the exposure, and a
vector of covariates, respectively. stdGlm
uses a fitted generalized linear
model to estimate the standardized
mean \theta(x)=E\{E(Y|X=x,Z)\}
, where x
is a specific value of X
,
and the outer expectation is over the marginal distribution of Z
.
Usage
stdGlm(fit, data, X, x, clusterid, case.control = FALSE, subsetnew)
Arguments
fit |
an object of class |
data |
a data frame containing the variables in the model. This should be the same
data frame as was used to fit the model in |
X |
a string containing the name of the exposure variable |
x |
an optional vector containing the specific values of |
clusterid |
an optional string containing the name of a cluster identification variable when data are clustered. |
case.control |
logical. Do data come from a case-control study? Defaults to FALSE. |
subsetnew |
an optional logical statement specifying a subset of observations to be used in the standardization. This set is assumed to be a subset of the subset (if any) that was used to fit the regression model. |
Details
stdGlm
assumes that a generalized linear model
\eta\{E(Y|X,Z)\}=h(X,Z;\beta)
has been fitted. The maximum likelihood estimate of \beta
is used to obtain
estimates of the mean E(Y|X=x,Z)
:
\hat{E}(Y|X=x,Z)=\eta^{-1}\{h(X=x,Z;\hat{\beta})\}.
For each x
in the x
argument, these estimates are averaged across
all subjects (i.e. all observed values of Z
) to produce estimates
\hat{\theta}(x)=\sum_{i=1}^n \hat{E}(Y|X=x,Z_i)/n,
where Z_i
is the value of Z
for subject i
, i=1,...,n
.
The variance for \hat{\theta}(x)
is obtained by the sandwich formula.
Value
An object of class "stdGlm"
is a list containing
call |
the matched call. |
input |
|
est |
a vector with length equal to |
vcov |
a square matrix with |
Note
The variance calculation performed by stdGlm
does not condition on
the observed covariates \bar{Z}=(Z_1,...,Z_n)
. To see how this matters, note that
var\{\hat{\theta}(x)\}=E[var\{\hat{\theta}(x)|\bar{Z}\}]+var[E\{\hat{\theta}(x)|\bar{Z}\}].
The usual parameter \beta
in a generalized linear model does not depend
on \bar{Z}
. Thus, E(\hat{\beta}|\bar{Z})
is
independent of \bar{Z}
as well (since E(\hat{\beta}|\bar{Z})=\beta
), so that the
term var[E\{\hat{\beta}|\bar{Z}\}]
in the corresponding variance decomposition
for var(\hat{\beta})
becomes equal to 0. However, \theta(x)
depends
on \bar{Z}
through the average over the sample distribution for Z
,
and thus the term var[E\{\hat{\theta}(x)|\bar{Z}\}]
is not 0, unless one
conditions on \bar{Z}
.
Author(s)
Arvid Sjolander.
References
Rothman K.J., Greenland S., Lash T.L. (2008). Modern Epidemiology, 3rd edition. Lippincott, Williams \& Wilkins.
Sjolander A. (2016). Regression standardization with the R-package stdReg. European Journal of Epidemiology 31(6), 563-574.
Sjolander A. (2016). Estimation of causal effect measures with the R-package stdReg. European Journal of Epidemiology 33(9), 847-858.
Examples
##Example 1: continuous outcome
n <- 1000
Z <- rnorm(n)
X <- rnorm(n, mean=Z)
Y <- rnorm(n, mean=X+Z+0.1*X^2)
dd <- data.frame(Z, X, Y)
fit <- glm(formula=Y~X+Z+I(X^2), data=dd)
fit.std <- stdGlm(fit=fit, data=dd, X="X", x=seq(-3,3,0.5))
print(summary(fit.std))
plot(fit.std)
##Example 2: binary outcome
n <- 1000
Z <- rnorm(n)
X <- rnorm(n, mean=Z)
Y <- rbinom(n, 1, prob=(1+exp(X+Z))^(-1))
dd <- data.frame(Z, X, Y)
fit <- glm(formula=Y~X+Z+X*Z, family="binomial", data=dd)
fit.std <- stdGlm(fit=fit, data=dd, X="X", x=seq(-3,3,0.5))
print(summary(fit.std))
plot(fit.std)