ctmleDiscrete {ctmle}R Documentation

Discrete Collaborative Targeted Minimum-loss based Estimation


This function computes the discrete Collaborative Targeted Minimum-loss based Estimator for variable selection. It includes the greedy C-TMLE algorithm (Gruber and van der Laan 2010), and scalable C-TMLE algorithm (Ju, Gruber, and Lendle et al. 2016) with a user-specified order.


ctmleDiscrete(Y, A, W, Wg = W, Q = NULL, preOrder = FALSE, order = NULL,
  patience = FALSE, Qbounds = NULL, cvQinit = FALSE, Qform = NULL,
  SL.library = NULL, alpha = 0.995, family = "gaussian", gbound = 0.025,
  like_type = "RSS", fluctuation = "logistic", verbose = FALSE,
  detailed = FALSE, PEN = FALSE, V = 5, folds = NULL,
  stopFactor = 10^6)



continuous or binary outcome variable


binary treatment indicator, 1 for treatment, 0 for control


vector, matrix, or dataframe containing baseline covariates for Q bar


vector, matrix, or dataframe containing baseline covariates for propensity score model (defaults to W if not supplied by user)


n by 2 matrix of initial values for Q0W, Q1W in columns 1 and 2, respectively. Current version does not support SL for automatic initial estimation of Q bar


boolean indicator for using scalable C-TMLE algorithm or not


the use-specified order of covariables. Only used when (preOrder = TRUE). If not supplied by user, it would automatically order covariates from W_1 to W_p


a number to stop early when the score in the CV function does not improve after so many covariates. Used only when (preOrder = TRUE)


bound on initial Y and predicted values for Q.


if TRUE, cross-validate initial values for Q to avoid overfits


optional regression formula for estimating initial Q


optional vector of prediction algorithms for data adaptive estimation of Q, defaults to glm, and glmnet


used to keep predicted initial values bounded away from (0,1) for logistic fluctuation, 0.995 (default)


family specification for working regression models, generally 'gaussian' for continuous outcomes (default), 'binomial' for binary outcomes


bound on P(A=1|W), defaults to 0.025


'RSS' or 'loglike'. The metric to use for forward selection and cross-validation


'logistic' (default) or 'linear', for targeting step


print status messages if TRUE


boolean number. If it is TRUE, return more detailed results


boolean. If true, penalized loss is used in cross-validation step


Number of folds. Only used if folds is not specified


The list of indices for cross-validation step. We recommend the cv-splits in C-TMLE matchs that in gn_candidate_cv


Numerical value with default 1e6. If the current empirical likelihood is stopFactor times larger than the best previous one, the construction would stop


best_k the index of estimate that selected by cross-validation

est estimate of psi_0

CI IC-based 95

pvalue pvalue for the null hypothesis that Psi = 0

likelihood sum of squared residuals, based on selected estimator evaluated on all obs or, logistic loglikelihood if like_type != 'RSS'

varIC empirical variance of the influence curve adjusted for estimation of g

varDstar empirical variance of the influence curve

var.psi variance of the estimate

varIC.cv cross-validated variance of the influence curve

penlikelihood.cv penalized cross-validated likelihood

cv.res all cross-validation results for each fold


## Not run: 
N <- 1000
p = 10
Wmat <- matrix(rnorm(N * p), ncol = p)
beta1 <- 4+2*Wmat[,1]+2*Wmat[,2]+2*Wmat[,5]+2*Wmat[,6]+2*Wmat[,8]
beta0 <- 2+2*Wmat[,1]+2*Wmat[,2]+2*Wmat[,5]+2*Wmat[,6]+2*Wmat[,8]
tauW <- 2
tau <- 2
gcoef <- matrix(c(-1,-1,rep(-(3/((p)-2)),(p)-2)),ncol=1)
Wm <- as.matrix(Wmat)
g <- 1/(1+exp(Wm%*%gcoef))
A <- rbinom(N, 1, prob = g)
sigma <- 1
epsilon <-rnorm(N,0,sigma)
Y  <- beta0 + tauW*A + epsilon

# Initial estimate of Q
Q <- cbind(rep(mean(Y[A == 0]), N), rep(mean(Y[A == 1]), N))

# User-suplied initial estimate
time_greedy <- system.time(
ctmle_discrete_fit1 <- ctmleDiscrete(Y = Y, A = A, W = data.frame(Wmat), Q = Q,
                                    preOrder = FALSE)

# If there is no input Q, then intial Q would be estimated by SL with Sl.library
ctmle_discrete_fit2 <- ctmleDiscrete(Y = Y, A = A, W = data.frame(Wmat),
                                    preOrder = FALSE, detailed = TRUE)

# scalable C-TMLE with pre-order option; order is user-specified,
# If 'order' is  not specified takes order from W1 to Wp.
time_preorder <- system.time(
ctmle_discrete_fit3 <- ctmleDiscrete(Y = Y, A = A, W = data.frame(Wmat), Q = Q,
                                    preOrder = TRUE,
                                    order = rev(1:p), detailed = TRUE)

# Compare the running time

## End(Not run)

[Package ctmle version 0.1.2 Index]