tmleMSM {tmle}R Documentation

Targeted Maximum Likelihood Estimation of Parameter of MSM

Description

Targeted maximum likelihood estimation of the parameter of a marginal structural model (MSM) for binary point treatment effects. The tmleMSM function is minimally called with arguments (Y,A,W, MSM), where Y is a continuous or binary outcome variable, A is a binary treatment variable, (A=1 for treatment, A=0 for control), and W is a matrix or dataframe of baseline covariates. MSM is a valid regression formula for regressing Y on any combination of A, V, W, T, where V defines strata and T represents the time at which repeated measures on subjects are made. Missingness in the outcome is accounted for in the estimation procedure if missingness indicator Delta is 0 for some observations. Repeated measures can be identified using the id argument. Observation weigths (sampling weights) may optionally be provided

Usage

tmleMSM(Y, A, W, V, T = rep(1,length(Y)), Delta = rep(1, length(Y)), MSM, 
        v = NULL, Q = NULL, Qform = NULL, Qbounds = c(-Inf, Inf), 
        Q.SL.library = c("SL.glm", "tmle.SL.dbarts2", "SL.glmnet"), 
        cvQinit = TRUE, hAV = NULL, hAVform = NULL, g1W = NULL, 
        gform = NULL, pDelta1 = NULL, g.Deltaform = NULL, 
	g.SL.library = c("SL.glm", "tmle.SL.dbarts.k.5", "SL.gam"),
	g.Delta.SL.library = c("SL.glm", "tmle.SL.dbarts.k.5", "SL.gam"),
	ub = sqrt(sum(Delta))* log(sum(Delta)) / 5, family = "gaussian", 
	fluctuation = "logistic", alpha  = 0.995, id = 1:length(Y), 
	V.Q = 10, V.g = 10, V.Delta = 10, inference = TRUE, verbose = FALSE, 
	Q.discreteSL = FALSE, g.discreteSL = FALSE, alpha.sig = 0.05, obsWeights = NULL)

Arguments

Y

continuous or binary outcome variable

A

binary treatment indicator, 1 - treatment, 0 - control

W

vector, matrix, or dataframe containing baseline covariates. Factors are not currently allowed.

V

vector, matrix, or dataframe of covariates used to define strata

T

optional time for repeated measures data

Delta

indicator of missing outcome or treatment assignment. 1 - observed, 0 - missing

MSM

MSM of interest, specified as valid right hand side of a regression formula (see examples)

v

optional value defining the strata of interest (V=v) for stratified estimation of MSM parameter

Q

optional n \times 2 matrix of initial values for Q portion of the likelihood, (E(Y|A=0,W), E(Y|A=1,W))

Qform

optional regression formula for estimation of E(Y|A, W), suitable for call to glm

Qbounds

vector of upper and lower bounds on Y and predicted values for initial Q

Q.SL.library

optional vector of prediction algorithms to use for SuperLearner estimation of initial Q

cvQinit

logical, if TRUE, estimates cross-validated predicted values using discrete super learning, default=TRUE

hAV

optional n \times 2 matrix used in numerator of weights for updating covariate and the influence curve. If unspecified, defaults to conditional probabilities P(A=1|V) or P(A=1|T), for repeated measures data. For unstabilized weights, pass in an n \times 2 matrix of all 1s

hAVform

optionalregression formula of the form A~V+T, if specified this overrides the call to SuperLearner

g1W

optional vector of conditional treatment assingment probabilities, P(A=1|W)

gform

optional regression formula of the form A~W, if specified this overrides the call to SuperLearner

pDelta1

optional n \times 2 matrix of conditional probabilities for missingness mechanism,P(Delta=1|A=0,V,W,T), P(Delta=1|A=1,V,W,T).

g.Deltaform

optional regression formula of the form Delta~A+W, if specified this overrides the call to SuperLearner

g.SL.library

optional vector of prediction algorithms to use for SuperLearner estimation of g1W

g.Delta.SL.library

optional vector of prediction algorithms to use for SuperLearner estimation ofpDelta1

ub

upper bound on inverse probability weights. See Details section for more information

family

family specification for working regression models, generally ‘gaussian’ for continuous outcomes (default), ‘binomial’ for binary outcomes

fluctuation

‘logistic’ (default), or ‘linear’

alpha

used to keep predicted initial values bounded away from (0,1) for logistic fluctuation

id

optional subject identifier

V.Q

number of cross-validation folds for Super Learner estimation of Q

V.g

number of cross-validation folds for Super Learner estimation of g

V.Delta

number of cross-validation folds for Super Learner estimation of g_Delta

inference

if TRUE, variance-covariance matrix, standard errors, pvalues, and 95% confidence intervals are calculated. Setting to FALSE saves a little time when bootstrapping.

verbose

status messages printed if set to TRUE (default=FALSE)

Q.discreteSL

If true, use discrete SL to estimate Q, otherwise ensembleSL by default. Ignored when SL is not used.

g.discreteSL

If true, use discrete SL to estimate each component of g, otherwise ensembleSL by default. Ignored when SL is not used.

alpha.sig

significance level for constructing 1-alpha.sig confidence intervals

obsWeights

optional weights for biased sampling and two-stage designs.

Details

ub bounds the IC by bounding the factor h(A,V)/[g(A,V,W)P(Delta=1|A,V,W)] between 0 and ub, default value based on sample size.

Value

psi

MSM parameter estimate

sigma

variance covariance matrix

se

standard errors extracted from sigma

pvalue

two-sided p-value

lb

lower bound on 95% confidence interval

ub

upper bound on 95% confidence interval

epsilon

fitted value of epsilon used to target initial Q

psi.Qinit

MSM parameter estimate based on untargeted initial Q

Qstar

targeted estimate of Q, an n \times 2 matrix with predicted values for Q(0,W), Q(1,W) using the updated fit

Qinit

initial estimate of Q. Qinit$coef are the coefficients for a glm model for Q, if applicable. Qinit$Q is an n \times 2 matrix, where n is the number of observations. Columns contain predicted values for Q(0,W),Q(1,W) using the initial fit. Qinit$type is method for estimating Q

g

treatment mechanism estimate. A list with three items: g$g1W contains estimates of P(A=1|W) for each observation, g$coef the coefficients for the model for g when glm used, g$type estimation procedure

g.AV

estimate for h(A,V) or h(A,T). A list with three items: g.AV$g1W an n \times 2 matrix containing values of P(A=0|V,T), P(A=1|V,T) for each observation, g.AV$coef the coefficients for the model for g when glm used, g.AV$type estimation procedure

g_Delta

missingness mechanism estimate. A list with three items: g_Delta$g1W an n \times 2 matrix containing values of P(Delta=1|A,V,W,T) for each observation, g_Delta$coef the coefficients for the model for g when glm used, g_Delta$type estimation procedure

Author(s)

Susan Gruber sgruber@cal.berkeley.edu, in collaboration with Mark van der Laan.

References

1. Gruber, S. and van der Laan, M.J. (2012), tmle: An R Package for Targeted Maximum Likelihood Estimation. Journal of Statistical Software, 51(13), 1-35. https://www.jstatsoft.org/v51/i13/

2. Rosenblum, M. and van der Laan, M.J. (2010), Targeted Maximum Likelihood Estimation of the Parameter of a Marginal Structural Model. The International Journal of Biostatistics,6(2), 2010.

3. Gruber, S., Phillips, R.V., Lee, H., van der Laan, M.J. Data-Adaptive Selection of the Propensity Score Truncation Level for Inverse Probability Weighted and Targeted Maximum Likelihood Estimators of Marginal Point Treatment Effects. American Journal of Epidemiology 2022; 191(9), 1640-1651.

See Also

summary.tmleMSM, estimateQ, estimateG, calcSigma, tmle

Examples

library(tmle)
# Example 1. Estimating MSM parameter with correctly specified regression formulas
# MSM: psi0 + psi1*A + psi2*V + psi3*A*V  (saturated)
# true parameter value: psi = (0, 1, -2, 0.5) 
# generate data
  set.seed(100)
  n <- 1000
  W <- matrix(rnorm(n*3), ncol = 3)
  colnames(W) <- c("W1", "W2", "W3")
  V <- rbinom(n, 1, 0.5)
  A <- rbinom(n, 1, 0.5)
  Y <- rbinom(n, 1, plogis(A - 2*V + 0.5*A*V))
  result.ex1 <- tmleMSM(Y, A, W, V, MSM = "A*V", Qform = "Y~.", gform = "A~1", 
                        hAVform = "A~1", family = "binomial")
  print(result.ex1)
## Not run: 

# Example 2. Biased sampling from example 1 population
# (observations having V = 1 twice as likely to be included in the dataset
  retain.ex2 <- sample(1:n, size = n/2, p = c(1/3 + 1/3*V))
  wt.ex2 <- 1/(1/3 + 1/3*V)
  result.ex2 <- tmleMSM(Y[retain.ex2], A[retain.ex2], W[retain.ex2,], 
			V[retain.ex2], MSM = "A*V", Qform = "Y~.", gform = "A~1", 
                        hAVform = "A~1", family = "binomial",
			obsWeight = wt.ex2[retain.ex2])
  print(result.ex2)

# Example 3. Repeated measures data, two observations per id
# (e.g., crossover study design)
# MSM: psi0 + psi1*A + psi2*V + psi3*V^2 + psi4*T
# true parameter value: psi = (-2, 1, 0, -2, 0 )
# generate data in wide format (id,  W1, Y(t),  W2(t), V(t), A(t)) 
   set.seed(10)
   n <- 250
   id <- rep(1:n)
   W1   <- rbinom(n, 1, 0.5)
   W2.1 <- rnorm(n)
   W2.2 <- rnorm(n)  
   V.1   <- rnorm(n)  
   V.2   <- rnorm(n)
   A.1 <- rbinom(n, 1, plogis(0.5 + 0.3 * W2.1))
   A.2 <- 1-A.1
   Y.1  <- -2 + A.1 - 2*V.1^2 + W2.1 + rnorm(n)
   Y.2  <- -2 + A.2 - 2*V.2^2 + W2.2 + rnorm(n)
   d <- data.frame(id, W1, W2=W2.1, W2.2, V=V.1, V.2, A=A.1, A.2, Y=Y.1, Y.2)

# change dataset from wide to long format
   longd <- reshape(d, 
          varying = cbind(c(3, 5, 7, 9), c(4, 6, 8, 10)),
          idvar = "id",
          direction = "long",
          timevar = "T",
          new.row.names = NULL,
          sep = "")
# misspecified model for initial Q, partial misspecification for g. 
# V set to 2 for Q and g to save time, not recommended at this sample size
   result.ex3 <- tmleMSM(Y = longd$Y, A = longd$A, W = longd[,c("W1", "W2")], V = longd$V, 
          T = longd$T, MSM = "A + V + I(V^2) + T", Qform = "Y ~ A + V", gform = "A ~ W2", 
	id = longd$id, V.Q=2, V.g=2)
   print(result.ex3)


# Example 4:  Introduce 20
# V set to 2 for Q and g to save time, not recommended at this sample size
  Delta <- rbinom(nrow(longd), 1, 0.8)
  result.ex4 <- tmleMSM(Y = longd$Y, A = longd$A, W = longd[,c("W1", "W2")], V = longd$V, T=longd$T,
          Delta = Delta, MSM = "A + V + I(V^2) + T", Qform = "Y ~ A + V", gform = "A ~ W2",
	  g.Deltaform = "Delta ~ 1", id=longd$id, verbose = TRUE, V.Q=2, V.g=2)
  print(result.ex4)


## End(Not run)

[Package tmle version 2.0.1 Index]