AFglm {AF} | R Documentation |
Attributable fraction estimation based on a logistic regression model from a glm
object (commonly used for cross-sectional or case-control sampling designs).
Description
AFglm
estimates the model-based adjusted attributable fraction for data from a logistic regression model in the form of a glm
object. This model is commonly used for data from a cross-sectional or non-matched case-control sampling design.
Usage
AFglm(object, data, exposure, clusterid, case.control = FALSE)
Arguments
object |
a fitted logistic regression model object of class " |
data |
an optional data frame, list or environment (or object coercible by |
exposure |
the name of the exposure variable as a string. The exposure must be binary (0/1) where unexposed is coded as 0. |
clusterid |
the name of the cluster identifier variable as a string, if data are clustered. Cluster robust standard errors will be calculated. |
case.control |
can be set to |
Details
AFglm
estimates the attributable fraction for a binary outcome Y
under the hypothetical scenario where a binary exposure X
is eliminated from the population.
The estimate is adjusted for confounders Z
by logistic regression using the (glm
) function.
The estimation strategy is different for cross-sectional and case-control sampling designs even if the underlying logististic regression model is the same.
For cross-sectional sampling designs the AF can be defined as
AF=1-\frac{Pr(Y_0=1)}{Pr(Y=1)}
where Pr(Y_0=1)
denotes the counterfactual probability of the outcome if
the exposure would have been eliminated from the population and Pr(Y = 1)
denotes the factual probability of the outcome.
If Z
is sufficient for confounding control, then Pr(Y_0=1)
can be expressed as
E_Z\{Pr(Y=1\mid{X=0,Z})\}.
The function uses logistic regression to estimate Pr(Y=1\mid{X=0,Z})
, and the marginal sample distribution of Z
to approximate the outer expectation (Sjölander and Vansteelandt, 2012).
For case-control sampling designs the outcome prevalence is fixed by sampling design and absolute probabilities (P.est
and P0.est
) can not be estimated.
Instead adjusted log odds ratios (log.or
) are estimated for each individual.
This is done by setting case.control
to TRUE
. It is then assumed that the outcome is rare so that the risk ratio can be approximated by the odds ratio.
For case-control sampling designs the AF be defined as (Bruzzi et. al)
AF = 1 - \frac{Pr(Y_0=1)}{Pr(Y = 1)}
where Pr(Y_0=1)
denotes the counterfactual probability of the outcome if
the exposure would have been eliminated from the population. If Z
is sufficient for confounding control then the probability Pr(Y_0=1)
can be expressed as
Pr(Y_0=1)=E_Z\{Pr(Y=1\mid{X}=0,Z)\}.
Using Bayes' theorem this implies that the AF can be expressed as
AF = 1-\frac{E_Z\{Pr(Y=1\mid X=0,Z)\}}{Pr(Y=1)}=1-E_Z\{RR^{-X}(Z)\mid{Y = 1}\}
where RR(Z)
is the risk ratio
\frac{Pr(Y=1\mid{X=1,Z})}{Pr(Y=1\mid{X=0,Z})}.
Moreover, the risk ratio can be approximated by the odds ratio if the outcome is rare. Thus,
AF \approx 1 - E_Z\{OR^{-X}(Z)\mid{Y = 1}\}.
If clusterid
is supplied, then a clustered sandwich formula is used in all variance calculations.
Value
AF.est |
estimated attributable fraction. |
AF.var |
estimated variance of |
P.est |
estimated factual proportion of cases; |
P.var |
estimated variance of |
P0.est |
estimated counterfactual proportion of cases if exposure would be eliminated; |
P0.var |
estimated variance of |
log.or |
a vector of the estimated log odds ratio for every individual.
then
then |
Author(s)
Elisabeth Dahlqwist, Arvid Sjölander
References
Bruzzi, P., Green, S. B., Byar, D., Brinton, L. A., and Schairer, C. (1985). Estimating the population attributable risk for multiple risk factors using case-control data. American Journal of Epidemiology 122, 904-914.
Greenland, S. and Drescher, K. (1993). Maximum Likelihood Estimation of the Attributable Fraction from logistic Models. Biometrics 49, 865-872.
Sjölander, A. and Vansteelandt, S. (2011). Doubly robust estimation of attributable fractions. Biostatistics 12, 112-121.
See Also
glm
used for fitting the logistic regression model. For conditional logistic regression (commonly for data from a matched case-control sampling design) see AFclogit
.
Examples
# Simulate a cross-sectional sample
expit <- function(x) 1 / (1 + exp( - x))
n <- 1000
Z <- rnorm(n = n)
X <- rbinom(n = n, size = 1, prob = expit(Z))
Y <- rbinom(n = n, size = 1, prob = expit(Z + X))
# Example 1: non clustered data from a cross-sectional sampling design
data <- data.frame(Y, X, Z)
# Fit a glm object
fit <- glm(formula = Y ~ X + Z + X * Z, family = binomial, data = data)
# Estimate the attributable fraction from the fitted logistic regression
AFglm_est <- AFglm(object = fit, data = data, exposure = "X")
summary(AFglm_est)
# Example 2: clustered data from a cross-sectional sampling design
# Duplicate observations in order to create clustered data
id <- rep(1:n, 2)
data <- data.frame(id = id, Y = c(Y, Y), X = c(X, X), Z = c(Z, Z))
# Fit a glm object
fit <- glm(formula = Y ~ X + Z + X * Z, family = binomial, data = data)
# Estimate the attributable fraction from the fitted logistic regression
AFglm_clust <- AFglm(object = fit, data = data,
exposure = "X", clusterid = "id")
summary(AFglm_clust)
# Example 3: non matched case-control
# Simulate a sample from a non matched case-control sampling design
# Make the outcome a rare event by setting the intercept to -6
expit <- function(x) 1 / (1 + exp( - x))
NN <- 1000000
n <- 500
intercept <- -6
Z <- rnorm(n = NN)
X <- rbinom(n = NN, size = 1, prob = expit(Z))
Y <- rbinom(n = NN, size = 1, prob = expit(intercept + X + Z))
population <- data.frame(Z, X, Y)
Case <- which(population$Y == 1)
Control <- which(population$Y == 0)
# Sample cases and controls from the population
case <- sample(Case, n)
control <- sample(Control, n)
data <- population[c(case, control), ]
# Fit a glm object
fit <- glm(formula = Y ~ X + Z + X * Z, family = binomial, data = data)
# Estimate the attributable fraction from the fitted logistic regression
AFglm_est_cc <- AFglm(object = fit, data = data, exposure = "X", case.control = TRUE)
summary(AFglm_est_cc)