R: Targeted Maximum Likelihood Estimation of Parameter of MSM

tmleMSM {tmle}

R Documentation

Targeted Maximum Likelihood Estimation of Parameter of MSM

Description

Targeted maximum likelihood estimation of the parameter of a marginal structural model (MSM) for binary point treatment effects. The tmleMSM function is minimally called with arguments (Y,A,W, MSM), where Y is a continuous or binary outcome variable, A is a binary treatment variable, (A=1 for treatment, A=0 for control), and W is a matrix or dataframe of baseline covariates. MSM is a valid regression formula for regressing Y on any combination of A, V, W, T, where V defines strata and T represents the time at which repeated measures on subjects are made. Missingness in the outcome is accounted for in the estimation procedure if missingness indicator Delta is 0 for some observations. Repeated measures can be identified using the id argument. Observation weigths (sampling weights) may optionally be provided

Usage

tmleMSM(Y, A, W, V, T = rep(1,length(Y)), Delta = rep(1, length(Y)), MSM, 
        v = NULL, Q = NULL, Qform = NULL, Qbounds = c(-Inf, Inf), 
        Q.SL.library = c("SL.glm", "tmle.SL.dbarts2", "SL.glmnet"), 
        cvQinit = TRUE, hAV = NULL, hAVform = NULL, g1W = NULL, 
        gform = NULL, pDelta1 = NULL, g.Deltaform = NULL, 
	g.SL.library = c("SL.glm", "tmle.SL.dbarts.k.5", "SL.gam"),
	g.Delta.SL.library = c("SL.glm", "tmle.SL.dbarts.k.5", "SL.gam"),
	ub = sqrt(sum(Delta))* log(sum(Delta)) / 5, family = "gaussian", 
	fluctuation = "logistic", alpha  = 0.995, id = 1:length(Y), 
	V.Q = 10, V.g = 10, V.Delta = 10, inference = TRUE, verbose = FALSE, 
	Q.discreteSL = FALSE, g.discreteSL = FALSE, alpha.sig = 0.05, obsWeights = NULL)

Arguments

`Y`	continuous or binary outcome variable
`A`	binary treatment indicator, `1` - treatment, `0` - control
`W`	vector, matrix, or dataframe containing baseline covariates. Factors are not currently allowed.
`V`	vector, matrix, or dataframe of covariates used to define strata
`T`	optional time for repeated measures data
`Delta`	indicator of missing outcome or treatment assignment. `1` - observed, `0` - missing
`MSM`	MSM of interest, specified as valid right hand side of a regression formula (see examples)
`v`	optional value defining the strata of interest (`V=v`) for stratified estimation of MSM parameter
`Q`	optional `n \times 2` matrix of initial values for `Q` portion of the likelihood, `(E(Y\|A=0,W), E(Y\|A=1,W))`
`Qform`	optional regression formula for estimation of `E(Y\|A, W)`, suitable for call to `glm`
`Qbounds`	vector of upper and lower bounds on `Y` and predicted values for initial `Q`
`Q.SL.library`	optional vector of prediction algorithms to use for `SuperLearner` estimation of initial `Q`
`cvQinit`	logical, if `TRUE`, estimates cross-validated predicted values using discrete super learning, default=`TRUE`
`hAV`	optional `n \times 2` matrix used in numerator of weights for updating covariate and the influence curve. If unspecified, defaults to conditional probabilities `P(A=1\|V)` or `P(A=1\|T)`, for repeated measures data. For unstabilized weights, pass in an `n \times 2` matrix of all 1s
`hAVform`	optionalregression formula of the form `A~V+T`, if specified this overrides the call to `SuperLearner`
`g1W`	optional vector of conditional treatment assingment probabilities, `P(A=1\|W)`
`gform`	optional regression formula of the form `A~W`, if specified this overrides the call to `SuperLearner`
`pDelta1`	optional `n \times 2` matrix of conditional probabilities for missingness mechanism,`P(Delta=1\|A=0,V,W,T), P(Delta=1\|A=1,V,W,T)`.
`g.Deltaform`	optional regression formula of the form `Delta~A+W`, if specified this overrides the call to `SuperLearner`
`g.SL.library`	optional vector of prediction algorithms to use for `SuperLearner` estimation of `g1W`
`g.Delta.SL.library`	optional vector of prediction algorithms to use for `SuperLearner` estimation of`pDelta1`
`ub`	upper bound on inverse probability weights. See `Details` section for more information
`family`	family specification for working regression models, generally ‘gaussian’ for continuous outcomes (default), ‘binomial’ for binary outcomes
`fluctuation`	‘logistic’ (default), or ‘linear’
`alpha`	used to keep predicted initial values bounded away from (0,1) for logistic fluctuation
`id`	optional subject identifier
`V.Q`	number of cross-validation folds for Super Learner estimation of Q
`V.g`	number of cross-validation folds for Super Learner estimation of g
`V.Delta`	number of cross-validation folds for Super Learner estimation of g_Delta
`inference`	if `TRUE`, variance-covariance matrix, standard errors, pvalues, and 95% confidence intervals are calculated. Setting to FALSE saves a little time when bootstrapping.
`verbose`	status messages printed if set to `TRUE` (default=`FALSE`)
`Q.discreteSL`	If true, use discrete SL to estimate Q, otherwise ensembleSL by default. Ignored when SL is not used.
`g.discreteSL`	If true, use discrete SL to estimate each component of g, otherwise ensembleSL by default. Ignored when SL is not used.
`alpha.sig`	significance level for constructing `1-alpha.sig` confidence intervals
`obsWeights`	optional weights for biased sampling and two-stage designs.

Details

ub bounds the IC by bounding the factor h(A,V)/[g(A,V,W)P(Delta=1|A,V,W)] between 0 and ub, default value based on sample size.

Value

`psi`	MSM parameter estimate
`sigma`	variance covariance matrix
`se`	standard errors extracted from sigma
`pvalue`	two-sided p-value
`lb`	lower bound on 95% confidence interval
`ub`	upper bound on 95% confidence interval
`epsilon`	fitted value of epsilon used to target initial `Q`
`psi.Qinit`	MSM parameter estimate based on untargeted initial `Q`
`Qstar`	targeted estimate of `Q`, an `n \times 2` matrix with predicted values for `Q(0,W), Q(1,W)` using the updated fit
`Qinit`	initial estimate of `Q`. `Qinit$coef` are the coefficients for a `glm` model for `Q`, if applicable. `Qinit$Q` is an `n \times 2` matrix, where `n` is the number of observations. Columns contain predicted values for `Q(0,W),Q(1,W)` using the initial fit. `Qinit$type` is method for estimating `Q`
`g`	treatment mechanism estimate. A list with three items: `g$g1W` contains estimates of `P(A=1\|W)` for each observation, `g$coef` the coefficients for the model for `g` when `glm` used, `g$type` estimation procedure
`g.AV`	estimate for h(A,V) or h(A,T). A list with three items: `g.AV$g1W` an `n \times 2` matrix containing values of `P(A=0\|V,T), P(A=1\|V,T)` for each observation, `g.AV$coef` the coefficients for the model for `g` when `glm` used, `g.AV$type` estimation procedure
`g_Delta`	missingness mechanism estimate. A list with three items: `g_Delta$g1W` an `n \times 2` matrix containing values of `P(Delta=1\|A,V,W,T)` for each observation, `g_Delta$coef` the coefficients for the model for `g` when `glm` used, `g_Delta$type` estimation procedure

Author(s)

Susan Gruber sgruber@cal.berkeley.edu, in collaboration with Mark van der Laan.

References

1. Gruber, S. and van der Laan, M.J. (2012), tmle: An R Package for Targeted Maximum Likelihood Estimation. Journal of Statistical Software, 51(13), 1-35. https://www.jstatsoft.org/v51/i13/

2. Rosenblum, M. and van der Laan, M.J. (2010), Targeted Maximum Likelihood Estimation of the Parameter of a Marginal Structural Model. The International Journal of Biostatistics,6(2), 2010.

3. Gruber, S., Phillips, R.V., Lee, H., van der Laan, M.J. Data-Adaptive Selection of the Propensity Score Truncation Level for Inverse Probability Weighted and Targeted Maximum Likelihood Estimators of Marginal Point Treatment Effects. American Journal of Epidemiology 2022; 191(9), 1640-1651.

Examples

library(tmle)
# Example 1. Estimating MSM parameter with correctly specified regression formulas
# MSM: psi0 + psi1*A + psi2*V + psi3*A*V  (saturated)
# true parameter value: psi = (0, 1, -2, 0.5) 
# generate data
  set.seed(100)
  n <- 1000
  W <- matrix(rnorm(n*3), ncol = 3)
  colnames(W) <- c("W1", "W2", "W3")
  V <- rbinom(n, 1, 0.5)
  A <- rbinom(n, 1, 0.5)
  Y <- rbinom(n, 1, plogis(A - 2*V + 0.5*A*V))
  result.ex1 <- tmleMSM(Y, A, W, V, MSM = "A*V", Qform = "Y~.", gform = "A~1", 
                        hAVform = "A~1", family = "binomial")
  print(result.ex1)
## Not run: 

# Example 2. Biased sampling from example 1 population
# (observations having V = 1 twice as likely to be included in the dataset
  retain.ex2 <- sample(1:n, size = n/2, p = c(1/3 + 1/3*V))
  wt.ex2 <- 1/(1/3 + 1/3*V)
  result.ex2 <- tmleMSM(Y[retain.ex2], A[retain.ex2], W[retain.ex2,], 
			V[retain.ex2], MSM = "A*V", Qform = "Y~.", gform = "A~1", 
                        hAVform = "A~1", family = "binomial",
			obsWeight = wt.ex2[retain.ex2])
  print(result.ex2)

# Example 3. Repeated measures data, two observations per id
# (e.g., crossover study design)
# MSM: psi0 + psi1*A + psi2*V + psi3*V^2 + psi4*T
# true parameter value: psi = (-2, 1, 0, -2, 0 )
# generate data in wide format (id,  W1, Y(t),  W2(t), V(t), A(t)) 
   set.seed(10)
   n <- 250
   id <- rep(1:n)
   W1   <- rbinom(n, 1, 0.5)
   W2.1 <- rnorm(n)
   W2.2 <- rnorm(n)  
   V.1   <- rnorm(n)  
   V.2   <- rnorm(n)
   A.1 <- rbinom(n, 1, plogis(0.5 + 0.3 * W2.1))
   A.2 <- 1-A.1
   Y.1  <- -2 + A.1 - 2*V.1^2 + W2.1 + rnorm(n)
   Y.2  <- -2 + A.2 - 2*V.2^2 + W2.2 + rnorm(n)
   d <- data.frame(id, W1, W2=W2.1, W2.2, V=V.1, V.2, A=A.1, A.2, Y=Y.1, Y.2)

# change dataset from wide to long format
   longd <- reshape(d, 
          varying = cbind(c(3, 5, 7, 9), c(4, 6, 8, 10)),
          idvar = "id",
          direction = "long",
          timevar = "T",
          new.row.names = NULL,
          sep = "")
# misspecified model for initial Q, partial misspecification for g. 
# V set to 2 for Q and g to save time, not recommended at this sample size
   result.ex3 <- tmleMSM(Y = longd$Y, A = longd$A, W = longd[,c("W1", "W2")], V = longd$V, 
          T = longd$T, MSM = "A + V + I(V^2) + T", Qform = "Y ~ A + V", gform = "A ~ W2", 
	id = longd$id, V.Q=2, V.g=2)
   print(result.ex3)


# Example 4:  Introduce 20
# V set to 2 for Q and g to save time, not recommended at this sample size
  Delta <- rbinom(nrow(longd), 1, 0.8)
  result.ex4 <- tmleMSM(Y = longd$Y, A = longd$A, W = longd[,c("W1", "W2")], V = longd$V, T=longd$T,
          Delta = Delta, MSM = "A + V + I(V^2) + T", Qform = "Y ~ A + V", gform = "A ~ W2",
	  g.Deltaform = "Delta ~ 1", id=longd$id, verbose = TRUE, V.Q=2, V.g=2)
  print(result.ex4)


## End(Not run)

[Package tmle version 2.0.1.1 Index]