approxdist {SAMBA} | R Documentation |
Estimate parameters in the disease model approximating the observed data distribution
Description
approxdist
estimates parameters in the disease model given
a previously-estimated marginal sensitivity. This estimation is based on
approximating the distribution of D* given Z.
Usage
approxdist(Dstar, Z, c_marg, weights = NULL)
Arguments
Dstar |
Numeric vector containing observed disease status. Should be coded as 0/1 |
Z |
Numeric matrix of covariates in disease model |
c_marg |
marginal sensitivity, P(D* = 1 | D = 1, S = 1) |
weights |
Optional numeric vector of patient-specific weights used for selection bias adjustment. Default is NULL |
Details
We are interested in modeling the relationship between binary disease status and covariates Z using a logistic regression model. However, D may be misclassified, and our observed data may not well-represent the population of interest. In this setting, we estimate parameters from the disease model using the following modeling framework.
Notation:
- D
Binary disease status of interest.
- D*
Observed binary disease status. Potentially a misclassified version of D. We assume D = 0 implies D* = 0.
- S
Indicator for whether patient from population of interest is included in the analytical dataset.
- Z
Covariates in disease model of interest.
- W
Covariates in model for patient inclusion in analytical dataset (selection model).
- X
Covariates in model for probability of observing disease given patient has disease (sensitivity model).
Model Structure:
- Disease Model
logit(P(D=1|X)) = theta_0 + theta_Z Z
- Selection Model
P(S=1|W,D)
- Sensitivity Model
logit(P(D* = 1| D = 1, S = 1, X)) = beta_0 + beta_X X
Value
a list with two elements: (1) 'param', a vector with parameter estimates for disease model (logOR of Z), and (2) 'variance', a vector of variance estimates for disease model parameters. Results do not include intercept.
References
Statistical inference for association studies using electronic health records: handling both selection bias and outcome misclassification Lauren J Beesley and Bhramar Mukherjee medRxiv 2019.12.26.19015859
Examples
library(SAMBA)
# These examples are generated from the vignette. See it for more details.
# Generate IPW weights from the true model
expit <- function(x) exp(x) / (1 + exp(x))
prob.WD <- expit(-0.6 + 1 * samba.df$D + 0.5 * samba.df$W)
weights <- nrow(samba.df) * (1 / prob.WD) / (sum(1 / prob.WD))
# Estimate sensitivity by using inverse probability of selection weights
# and P(D=1)
sens <- sensitivity(samba.df$Dstar, samba.df$X, prev = mean(samba.df$D),
weights = weights)
approx1 <- approxdist(samba.df$Dstar, samba.df$Z, sens$c_marg,
weights = weights)