ctmleDiscrete {ctmle}R Documentation

Discrete Collaborative Targeted Minimum-loss based Estimation

Description

This function computes the discrete Collaborative Targeted Minimum-loss based Estimator for variable selection. It includes the greedy C-TMLE algorithm (Gruber and van der Laan 2010), and scalable C-TMLE algorithm (Ju, Gruber, and Lendle et al. 2016) with a user-specified order.

Usage

ctmleDiscrete(Y, A, W, Wg = W, Q = NULL, preOrder = FALSE, order = NULL,
  patience = FALSE, Qbounds = NULL, cvQinit = FALSE, Qform = NULL,
  SL.library = NULL, alpha = 0.995, family = "gaussian", gbound = 0.025,
  like_type = "RSS", fluctuation = "logistic", verbose = FALSE,
  detailed = FALSE, PEN = FALSE, V = 5, folds = NULL,
  stopFactor = 10^6)

Arguments

Y

continuous or binary outcome variable

A

binary treatment indicator, 1 for treatment, 0 for control

W

vector, matrix, or dataframe containing baseline covariates for Q bar

Wg

vector, matrix, or dataframe containing baseline covariates for propensity score model (defaults to W if not supplied by user)

Q

n by 2 matrix of initial values for Q0W, Q1W in columns 1 and 2, respectively. Current version does not support SL for automatic initial estimation of Q bar

preOrder

boolean indicator for using scalable C-TMLE algorithm or not

order

the use-specified order of covariables. Only used when (preOrder = TRUE). If not supplied by user, it would automatically order covariates from W_1 to W_p

patience

a number to stop early when the score in the CV function does not improve after so many covariates. Used only when (preOrder = TRUE)

Qbounds

bound on initial Y and predicted values for Q.

cvQinit

if TRUE, cross-validate initial values for Q to avoid overfits

Qform

optional regression formula for estimating initial Q

SL.library

optional vector of prediction algorithms for data adaptive estimation of Q, defaults to glm, and glmnet

alpha

used to keep predicted initial values bounded away from (0,1) for logistic fluctuation, 0.995 (default)

family

family specification for working regression models, generally 'gaussian' for continuous outcomes (default), 'binomial' for binary outcomes

gbound

bound on P(A=1|W), defaults to 0.025

like_type

'RSS' or 'loglike'. The metric to use for forward selection and cross-validation

fluctuation

'logistic' (default) or 'linear', for targeting step

verbose

print status messages if TRUE

detailed

boolean number. If it is TRUE, return more detailed results

PEN

boolean. If true, penalized loss is used in cross-validation step

V

Number of folds. Only used if folds is not specified

folds

The list of indices for cross-validation step. We recommend the cv-splits in C-TMLE matchs that in gn_candidate_cv

stopFactor

Numerical value with default 1e6. If the current empirical likelihood is stopFactor times larger than the best previous one, the construction would stop

Value

best_k the index of estimate that selected by cross-validation

est estimate of psi_0

CI IC-based 95

pvalue pvalue for the null hypothesis that Psi = 0

likelihood sum of squared residuals, based on selected estimator evaluated on all obs or, logistic loglikelihood if like_type != 'RSS'

varIC empirical variance of the influence curve adjusted for estimation of g

varDstar empirical variance of the influence curve

var.psi variance of the estimate

varIC.cv cross-validated variance of the influence curve

penlikelihood.cv penalized cross-validated likelihood

cv.res all cross-validation results for each fold

Examples

## Not run: 
N <- 1000
p = 10
Wmat <- matrix(rnorm(N * p), ncol = p)
beta1 <- 4+2*Wmat[,1]+2*Wmat[,2]+2*Wmat[,5]+2*Wmat[,6]+2*Wmat[,8]
beta0 <- 2+2*Wmat[,1]+2*Wmat[,2]+2*Wmat[,5]+2*Wmat[,6]+2*Wmat[,8]
tauW <- 2
tau <- 2
gcoef <- matrix(c(-1,-1,rep(-(3/((p)-2)),(p)-2)),ncol=1)
Wm <- as.matrix(Wmat)
g <- 1/(1+exp(Wm%*%gcoef))
A <- rbinom(N, 1, prob = g)
sigma <- 1
epsilon <-rnorm(N,0,sigma)
Y  <- beta0 + tauW*A + epsilon

# Initial estimate of Q
Q <- cbind(rep(mean(Y[A == 0]), N), rep(mean(Y[A == 1]), N))

# User-suplied initial estimate
time_greedy <- system.time(
ctmle_discrete_fit1 <- ctmleDiscrete(Y = Y, A = A, W = data.frame(Wmat), Q = Q,
                                    preOrder = FALSE)
)

# If there is no input Q, then intial Q would be estimated by SL with Sl.library
ctmle_discrete_fit2 <- ctmleDiscrete(Y = Y, A = A, W = data.frame(Wmat),
                                    preOrder = FALSE, detailed = TRUE)

# scalable C-TMLE with pre-order option; order is user-specified,
# If 'order' is  not specified takes order from W1 to Wp.
time_preorder <- system.time(
ctmle_discrete_fit3 <- ctmleDiscrete(Y = Y, A = A, W = data.frame(Wmat), Q = Q,
                                    preOrder = TRUE,
                                    order = rev(1:p), detailed = TRUE)
)

# Compare the running time
time_greedy
time_preorder

## End(Not run)

[Package ctmle version 0.1.2 Index]