R: Instrumental variable estimation of the causal exposure...

ivglm {ivtools}

R Documentation

Instrumental variable estimation of the causal exposure effect in generalized linear models

Description

ivglm performs instrumental variable estimation of the causal exposure effect in generalized linear models with individual-level data. Below, Z, X, and Y are the instrument, the exposure, and the outcome, respectively. L is a vector of covariates that we wish to control for in the analysis; these would typically be confounders for the instrument and the outcome.

Usage

ivglm(estmethod, X, Y, fitZ.L=NULL, fitX.LZ=NULL, fitX.L=NULL, fitY.LX=NULL, 
  fitY.LZX=NULL, data, formula=~1, ctrl=FALSE, clusterid=NULL, link, vcov.fit=TRUE, 
  ...)

Arguments

`estmethod`	a string specifying the desired estimation method; either `"ts"` for two-stage estimation, or `"g"` for G-estimation.
`X`	a string specifying the name of the exposure `X` in `data`. This is not needed if `fitX.LZ` is specified.
`Y`	a string specifying the name of the outcome `Y` in `data`. This is not needed if `fitY.LX` or `fitY.LZX` is specified.
`fitZ.L`	an object of class `"glm"`, as returned by the `glm` function in the stats package. This is a fitted GLM for `E(Z\|L)`. If there are no covariates, then `fitZ.L` may be specified as a model with an intercept only. This argument is not used when `estmethod="ts"`.
`fitX.LZ`	an object of class `"glm"`, as returned by the `glm` function in the stats package. This is a fitted GLM for `E(X\|L,Z)`.
`fitX.L`	an object of class `"glm"`, as returned by the `glm` function in the stats package. This is a fitted GLM for `E(X\|L)`. If there are no covariates, then `fitX.L` may be specified as a model with an intercept only. This argument is not used when `estmethod="ts"`.
`fitY.LX`	an object of class `"glm"`, as returned by the `glm` function in the stats package. This is a fitted GLM for `E(Y\|L,X)`. This argument is not used when `estmethod="g"`.
`fitY.LZX`	an object of class `"glm"`, as returned by the `glm` function in the stats package. This is a fitted GLM for `E(Y\|L,Z,X)`. This argument is not used when `estmethod="ts"`. It is also not used when `estmethod="g"` and `link="identity"` or `link="log"`.
`data`	a data frame containing the variables in the model. The covariates, instrument, exposure and outcome can have arbitrary names, e.g. they don't need to be called `L`, `Z`, `X` and `Y`.
`formula`	an object of class `"formula"`, with no left-hand side. This specifies the causal interaction terms `m(L)`; see ‘Details’. Defaults to `~1`, i.e. main effect only. This argument is not used when `estmethod="ts"`.
`ctrl`	logical. Should the control function `R=X-\hat{X}` be used when re-fitting `fitY.LX`? This argument is not used when `estmethod="g"`.
`clusterid`	an optional string containing the name of a cluster identification variable when data are clustered. Specifying `clusterid` corrects the standard errors but does not affect the estimates.
`link`	a string specifying the link function for the causal generalized linear model; see ‘Details’. Either `"identity"`, `"log"`, or `"logit"`. This argument is not used when `estmethod="ts"`.
`vcov.fit`	logical. Should the variance-covariance matrix be computed?
`...`	optional arguments passed on to the `nleqslv` function, which is used to solve the estimating equations when `estmethod="g"`. See the help pages for `nleqslv`. This argument is not used when `estmethod="ts"`.

Details

ivglm estimates the parameter \psi in the causal generalized linear model

\eta\{E(Y|L,Z,X)\}-\eta\{E(Y_0|L,Z,X)\}=m^T(L)X\psi.

Here, E(Y_0|L,Z,X) is counterfactual mean of the outcome, had the exposure been set to 0. The link function \eta is either the identity, log or logit link, as specified by the link argument. The vector function m(L) contains interaction terms between L and X. If estmethod="ts", then these are specified implicitly through the model fitY.LX. If estmethod="g", then these are specified explicitly through the formula argument.

If estmethod="ts", then two-stage estimation of \psi is performed. In this case, the model fitX.LZ is used to construct predictions \hat{X}=\hat{E}(X|L,Z). These predictions are subsequently used to re-fit the model fitY.LX, with X replaced with \hat{X}. The obtained coefficient(s) for \hat{X} in the re-fitted model is the two-stage estimator of \psi.

If estmethod="g", then G-estimation of \psi is performed. In this case, the estimator is obtained as the solution to the estimating equation

H(\psi)=\sum_{i=1}^n\hat{d}(L_i,Z_i)h_i(\psi)=0.

The function h_i(\psi) is defined as

h_i(\psi)=Y_i-m^T(L_i)\psi X_i

when link="identity",

h_i(\psi)=Y_i\textrm{exp}\{-m^T(L_i)\psi X_i\}

when link="log", and

h_i(\psi)=\textrm{expit}[\textrm{logit}\{\hat{E}(Y|L_i,Z_i,X_i)\}-m^T(L_i)\psi X_i]

when link="logit". In the latter, \hat{E}(Y|L_i,Z_i,X_i) is an estimate of E(Y|L_i,Z_i,X_i) obtained from the model fitY.LZX. The estimated function \hat{d}(L,Z) is chosen so that the true function has conditional mean 0, given L; E\{d(L,Z)|L\}=0. The specific form of \hat{d}(L,Z) is determined by the user-specified models. If fitX.LZ and fitX.L are specified, then \hat{d}(L,Z)=m(L)\{\hat{E}(X|L,Z)-\hat{E}(X|L)\}, where \hat{E}(X|L,Z) and \hat{E}(X|L) are obtained from fitX.LZ and fitX.L, respectively. If these are not specified, then \hat{d}(L,Z)=m(L)\{Z-\hat{E}(Z|L)\}, where \hat{E}(Z|L) is obtained from fitZ.L, which then must be specified.

Value

ivglm returns an object of class "ivglm", which inherits from class "ivmod". An object of class "ivglm" is a list containing

`call`	the matched call.
`input`	`input` is a list containing all input arguments
`est`	a vector containing the estimate of `\psi`.
`vcov`	the variance-covariance matrix for the estimate of `\psi`, obtained with the sandwich formula.
`estfunall`	a matrix of all subject-specific contributions to the estimating functions used in the estimation process. One row for each subject, one column for each parameter. If `estmethod="ts"`, then the first columns correspond to the parameters estimated by `fitX.LZ`, and the last columns correspond to the parameters estimated by the re-fitted model `fitY.LX`. If `estmethod="g"`, then the first columns correspond to `\psi`, and the remaining columns correspond to the parameters estimated by `fitZ.L`, `fitX.LZ`, `fitX.L` and `fitY.LZX`, whichever were used in the estimation process.
`d.estfun`	the jacobian matrix of `colMeans(estfunall)`.
`converged`	logical. Was a solution found to the estimating equations?
`fitY.LX`	the re-fitted model `fitY.LX` used in the estimation process when `estmethod="ts"`. This element is NULL when `estmethod="g"`.

Note

ivglm allows for weights. However, these are defined implicitly through the input models. Thus, when models are used as input to ivglm, these models have to be fitted with the same weights. When estmethod="g" the weights are taken from fitX.LZ, if specified by the user. If fitX.LZ is not specified then the weights are taken from fitZ.L. Hence, if weights are used, then either fitX.LZ or fitZ.L must be specified.

Author(s)

Arvid Sjolander.

References

Bowden J., Vansteelandt S. (2011). Mendelian randomization analysis of case-control data using structural mean models. Statistics in Medicine 30(6), 678-694.

Sjolander A., Martinussen T. (2019). Instrumental variable estimation with the R package ivtools. Epidemiologic Methods 8(1), 1-20.

Vansteelandt S., Bowden J., Babanezhad M., Goetghebeur E. (2011). On instrumental variables estimation of causal odds ratios. Statistical Science 26(3), 403-422.

Examples


set.seed(9)

##Note: the parameter values in the examples below are chosen to make 
##Y0 independent of Z, which is necessary for Z to be a valid instrument.

n <- 1000
psi0 <- 0.5
psi1 <- 0.2

##---Example 1: linear model and interaction between X and L---

L <- rnorm(n)
Z <- rnorm(n, mean=L)
X <- rnorm(n, mean=Z)
m0 <- X-Z+L 
Y <- rnorm(n, mean=psi0*X+psi1*X*L+m0)
data <- data.frame(L, Z, X, Y)

#two-stage estimation
fitX.LZ <- glm(formula=X~Z, data=data)
fitY.LX <- glm(formula=Y~X+L+X*L, data=data)
fitIV <- ivglm(estmethod="ts", fitX.LZ=fitX.LZ, fitY.LX=fitY.LX, data=data, 
  ctrl=TRUE) 
summary(fitIV)

#G-estimation with model for Z
fitZ.L <- glm(formula=Z~L, data=data)
fitIV <- ivglm(estmethod="g", X="X", Y="Y", fitZ.L=fitZ.L, data=data, 
  formula=~L, link="identity")
summary(fitIV)

#G-estimation with model for X
fitX.LZ <- glm(formula=X~L+Z, data=data)
fitX.L <- glm(formula=X~L, data=data)
fitIV <- ivglm(estmethod="g", Y="Y", fitX.LZ=fitX.LZ, fitX.L=fitX.L, data=data, 
  formula=~L, link="identity")
summary(fitIV)

##---Example 2: logistic model and no covariates--- 

Z <- rbinom(n, 1, 0.5)
X <- rbinom(n, 1, 0.7*Z+0.2*(1-Z)) 
m0 <- plogis(1+0.8*X-0.39*Z)
Y <- rbinom(n, 1, plogis(psi0*X+log(m0/(1-m0)))) 
data <- data.frame(Z, X, Y)

#two-stage estimation
fitX.LZ <- glm(formula=X~Z, family="binomial", data=data)
fitY.LX <- glm(formula=Y~X, family="binomial", data=data)
fitIV <- ivglm(estmethod="ts", fitX.LZ=fitX.LZ, fitY.LX=fitY.LX, data=data, 
  ctrl=TRUE) 
summary(fitIV)

#G-estimation with model for Z
fitZ.L <- glm(formula=Z~1, data=data)
fitY.LZX <- glm(formula=Y~X+Z+X*Z, family="binomial", data=data)
fitIV <- ivglm(estmethod="g", X="X", fitZ.L=fitZ.L, fitY.LZX=fitY.LZX, 
  data=data, link="logit")
summary(fitIV)

[Package ivtools version 2.3.0 Index]