drgee {drgee} | R Documentation |
Doubly Robust Generalized Estimating Equations
Description
drgee
is used to estimate an exposure-outcome effect adjusted
for additional covariates. The estimation is based on regression
models for the outcome, exposure or a combination of both.
For clustered data the models may
have cluster-specific intercepts.
Usage
drgee(outcome, exposure,
oformula, eformula, iaformula = formula(~1),
olink = c("identity", "log", "logit"),
elink = c("identity", "log", "logit"),
data, subset = NULL, estimation.method = c("dr", "o", "e"),
cond = FALSE, clusterid, clusterid.vcov, rootFinder = findRoots,
intercept = TRUE, ...)
Arguments
outcome |
The outcome as variable or as a character string naming a variable
in the |
exposure |
The exposure as variable or as a character string naming a variable
in the |
oformula |
An expression or formula for the outcome nuisance model. |
eformula |
An expression or formula for the exposure nuisance model. |
iaformula |
An expression or formula where the RHS should contain the variables
that "interact" (i.e. are supposed to be multiplied with) with the
exposure in the main model. "1" will always added. Default value is no
interactions, i.e. |
olink |
A character string naming the link function in the outcome nuisance
model. Has to be |
elink |
A character string naming the link function in the exposure nuisance
model. Has to be |
data |
A data frame or environment containing the variables used. If missing, variables are expected to be found in the calling environment of the calling environment. |
subset |
An optional vector defining a subset of the data to be used. |
estimation.method |
A character string naming the desired estimation method. Choose
|
cond |
A logical value indicating whether the nuisance models should have
cluster-specific intercepts. Requires a |
rootFinder |
A function to solve a system of non-linear equations. Default
is |
clusterid |
A cluster-defining variable or a character string naming a
cluster-defining variable in the |
clusterid.vcov |
A cluster-defining variable or a character string naming a
cluster-defining variable in the |
intercept |
A boolean to choose whether the nuisance parameters in doubly robust conditional logistic regression should be fitted with a model with an intercept. Only used for doubly robust condtional logistic regression. |
... |
Further arguments to be passed to the function |
Details
drgee
estimates the parameter \beta
in a main
model g\{E(Y|A,L)\}-g\{E(Y|A=0,L)\}=\beta^T \{A\cdot X(L)\}
,
where Y
is the outcome of interest, A
is the exposure of
interest, and L
is a vector of covariates that we wish to
adjust for. X(L)
is a vector valued function of L
. Note that A
\cdot X(L)
should be interpreted as a columnwise
multiplication and that X(L)
will always contain a column of 1's.
Given a specification of an outcome nuisance model g\{E(Y|A=0,L)=\gamma^T
V(L)
(where V(L)
is a function of L
)
O-estimation is performed. Alternatively, leaving g\{E(Y|A=0,L)
unspecified and using an exposure nuisance model h\{E(A|L)\}=\alpha^T
Z(L)
(where h
is a link
function and Z(L)
is a function of L
) E-estimation
is performed. When g
is logit, the exposure nuisance
model is required be of the form
logit\{E(A|Y=0,L)\}=\alpha^T Z(L)
.
In this case the exposure needs to binary.
Given both an outcome and an exposure nuisance model, DR-estimation can be
performed. DR-estimation gives a consistent estimate of the parameter
\beta
when either the outcome nuisance model or
the exposure nuisance model
is correctly specified, not necessarily both.
Usage is best explained through an example. Suppose that we are
interested in the parameter vector (\beta_0,
\beta_1)
in a main model
logit\{E(Y|A,L_1,L_2)\}-logit\{E(Y|A=0,L_1,L_2)\}=\beta_0 A + \beta_1
A \cdot L_1
where L_1
and L_2
are the covariates that we wish
to adjust for. To adjust for L_1
and L_2
, we can use an outcome
nuisance model E(Y|A=0,L_1,L_2;\gamma_0, \gamma_1)=\gamma_0 + \gamma_1
L_1
or an
exposure nuisance model logit\{E(A|Y=0,L_1,L_2)\}=\alpha_0+\alpha_1
L_1+\alpha_2 L_2
to calculate estimates of \beta_0
and \beta_1
in the main model. We specify the outcome nuisance model as oformula=Y~L_1
and olink = "logit"
. The exposure nuisance model is specified as
eformula = A~L_1+L_2
and elink = "logit"
.
Since the outcome Y
and the exposure A
are
identified as the LHS of oformula
and eformla
respectively and since the outcome link is specified in the
olink
argument,
the only thing left to specify for the main model is the
(multiplicative) interactions A\cdot X(L)=A\cdot
(1,L_1)^T
. This
is done by specifying X(L)
as
iaformula = ~L_1
, since 1
is always included in X(L)
.
We can then perform O-estimation, E-estimation or DR-estimation by
setting estimation.method
to "o"
,
"e"
or "dr"
respectively. O-estimation uses only the
outcome nuisance model, and E-estimation uses only the exposure
nuisance model. DR-estimation uses both nuisance models, and gives a
consistent estimate of (\beta_0,\beta_1)
if either nuisance model is correct, not necessarily both.
When estimation.method = "o"
, the RHS of eformula
will be
ignored. The eformula
argument can also be replaced by an exposure
argument specifying what the exposure of interest is.
When estimation.method = "e"
, the RHS of oformula
will be
ignored. The oformula
argument can also be replaced by an outcome
argument specifying what the outcome of interest is.
When cond = TRUE
the nuisance models will be assumed to have
cluster-specific intercept. These intercepts will not estimated.
When E-estimation or DR-estimation is chosen with
olink = "logit"
, the exposure link will be
changed to "logit"
. Note that this choice
of outcome link does not work for DR-estimation
when cond = TRUE
.
Robust variance for the estimated parameter is calculated
using the function robVcov
. A cluster robust variance is calculated when
a character string naming a cluster variable is
supplied in the clusterid
argument.
For E-estimation when cond = FALSE
and g
is the identity
or log link, see Robins et al. (1992).
For DR-estimation when cond = TRUE
and g
is the identity
or log link, see Robins (1999). For DR-estimation when
g
is the logit link, see Tchetgen et al. (2010).
O-estimation can also be performed using the gee
function.
Value
drgee
returns an object of class drgee
containing:
coefficients |
Estimates of the parameters in the main model. |
vcov |
Robust variance for all main model parameters. |
coefficients.all |
Estimates of all estimated parameters. |
vcov.all |
Robust variance of the all parameter estimates. |
optim.object |
An estimation object returned from the function specified
in the |
optim.object.o |
An estimation object returned from the function specified
in the |
optim.object.e |
An estimation object returned from the function specified
in the |
call |
The matched call. |
estimation.method |
The value of the input argument |
data |
The original data object, if given as an input argument |
oformula |
The original oformula object, if given as an input argument |
eformula |
The original eformula object, if given as an input argument |
iaformula |
The original iaformula object, if given as an input argument |
The class methods coef
and vcov
can be used to extract
the estimated parameters and their covariance matrix from a
drgee
object. summary.drgee
produces a summary of the
calculations.
Author(s)
Johan Zetterqvist, Arvid Sjölander
References
Orsini N., Belocco R., Sjölander A. (2013), Doubly Robust Estimation in Generalized Linear Models, Stata Journal, 13, 1, pp. 185–205
Robins J.M., Mark S.D., Newey W.K. (1992), Estimating Exposure Effects by Modelling the Expectation of Exposure Conditional on Confounders, Biometrics, 48, pp. 479–495
Robins JM (1999), Robust Estimation in Sequentially Ignorable Missing Data and Causal Inference Models, Proceedings of the American Statistical Association Section on Bayesian Statistical Science, pp. 6–10
Tchetgen E.J.T., Robins J.M., Rotnitzky A. (2010), On Doubly Robust Estimation in a Semiparametric Odds Ratio Model, Biometrika, 97, 1, 171–180
Zetterqvist J., Vansteelandt S., Pawitan Y., Sjölander (2016), Doubly Robust Methods for Handling Confounding by Cluster, Biostatistics, 17, 2, 264–276
See Also
gee
for O-estimation, findRoots
for
nonlinear equation solving and robVcov
for
estimation of variance.
Examples
## DR-estimation when
## the main model is
## E(Y|A,L1,L2)-E(Y|A=0,L1,L2)=beta0*A+beta1*A*L1
## and the outcome nuisance model is
## E(Y|A=0,L1,L2)=gamma0+gamma1*L1+gamma2*L2
## and the exposure nuisance model is
## E(A|Y=0,L1,L2)=expit(alpha0+alpha1*L1+alpha2*l2)
library(drgee)
expit<-function(x) exp(x)/(1+exp(x))
n<-5000
## nuisance
l1<-rnorm(n, mean = 0, sd = 1)
l2<-rnorm(n, mean = 0, sd = 1)
beta0<-1.5
beta1<-1
gamma0<--1
gamma1<--2
gamma2<-2
alpha0<-1
alpha1<-5
alpha2<-3
## Exposure generated from the exposure nuisance model
a<-rbinom(n,1,expit(alpha0 + alpha1*l1 + alpha2*l2))
## Outcome generated from the main model and the
## outcome nuisance model
y<-rnorm(n,
mean = beta0 * a + beta1 * a * l1 + gamma0 + gamma1 * l1 + gamma2 * l2,
sd = 1)
simdata<-data.frame(y,a,l1,l2)
## outcome nuisance model misspecified and
## exposure nuisance model correctly specified
## DR-estimation
dr.est <- drgee(oformula = formula(y~l1),
eformula = formula(a~l1+l2),
iaformula = formula(~l1),
olink = "identity", elink = "logit",
data = simdata, estimation.method = "dr")
summary(dr.est)
## O-estimation
o.est <- drgee(exposure = "a", oformula = formula(y~l1),
iaformula = formula(~l1), olink = "identity",
data = simdata, estimation.method = "o")
summary(o.est)
## E-estimation
e.est <- drgee(outcome = "y", eformula = formula(a~l1+l2),
iaformula = formula(~l1), elink="logit",
data = simdata, estimation.method = "e")
summary(e.est)