ctmleGlmnet {ctmle}R Documentation

Collaborative Targeted Maximum Likelihood Estimation for hyper-parameter tuning of LASSO

Description

This function computes the Collaborative Maximum Likelihood Estimation for hyper-parameter tuning of LASSO.

Usage

ctmleGlmnet(Y, A, W, Wg = W, Q, lambdas = NULL, ctmletype, V = 5,
  folds = NULL, alpha = 0.995, family = "gaussian", gbound = 0.025,
  like_type = "RSS", fluctuation = "logistic", verbose = FALSE,
  detailed = FALSE, PEN = FALSE, g1W = NULL, g1WPrev = NULL,
  stopFactor = 10^6)

Arguments

Y

continuous or binary outcome variable

A

binary treatment indicator, 1 for treatment, 0 for control

W

vector, matrix, or dataframe containing baseline covariates for Q bar

Wg

vector, matrix, or dataframe containing baseline covariates for propensity score model (defaults to W if not supplied by user)

Q

n by 2 matrix of initial values for Q0W, Q1W in columns 1 and 2, respectively. Current version does not support SL for automatic initial estimation of Q bar

lambdas

numeric vector of lambdas (regularization parameter) for glmnet estimation of propensity score, with decreasing order. We recommend the first lambda is selected by external cross-validation.

ctmletype

1, 2 or 3. Type of general C-TMLE. Type 1 uses cross-validation to select best gn, Type 3 directly solves extra clever covariates, and Type 2 uses both cross-validation and extra covariate. See more details in !!!

V

Number of folds. Only used if folds is not specified

folds

The list of indices for cross-validation step. We recommend the cv-splits in C-TMLE matchs that in gn_candidate_cv

alpha

used to keep predicted initial values bounded away from (0,1) for logistic fluctuation, 0.995 (default)

family

family specification for working regression models, generally 'gaussian' for continuous outcomes (default), 'binomial' for binary outcomes

gbound

bound on P(A=1|W), defaults to 0.025

like_type

'RSS' or 'loglike'. The metric to use for forward selection and cross-validation

fluctuation

'logistic' (default) or 'linear', for targeting step

verbose

print status messages if TRUE

detailed

boolean number. If it is TRUE, return more detailed results

PEN

boolean. If true, penalized loss is used in cross-validation step

g1W

Only used when type is 3. a user-supplied propensity score estimate.

g1WPrev

Only used when type is 3. a user-supplied propensity score estimate, with small fluctuation compared to g1W.

stopFactor

Numerical value with default 1e6. If the current empirical likelihood is stopFactor times larger than the best previous one, the construction would stop

Value

best_k the index of estimate that selected by cross-validation

est estimate of psi_0

CI IC-based 95

pvalue pvalue for the null hypothesis that Psi = 0

likelihood sum of squared residuals, based on selected estimator evaluated on all obs or, logistic loglikelihood if like_type != 'RSS'

varIC empirical variance of the influence curve adjusted for estimation of g

varDstar empirical variance of the influence curve

var.psi variance of the estimate

varIC.cv cross-validated variance of the influence curve

penlikelihood.cv penalized cross-validatedlikelihood

cv.res all cross-validation results for each fold

Examples

## Not run: 
set.seed(123)
N <- 1000
p = 10
Wmat <- matrix(rnorm(N * p), ncol = p)
beta1 <- 4+2*Wmat[,1]+2*Wmat[,2]+2*Wmat[,5]+2*Wmat[,6]+2*Wmat[,8]
beta0 <- 2+2*Wmat[,1]+2*Wmat[,2]+2*Wmat[,5]+2*Wmat[,6]+2*Wmat[,8]
tau <- 2
gcoef <- matrix(c(-1,-1,rep(0,(p)-2)),ncol=1)
Wm <- as.matrix(Wmat)
g <- 1/(1+exp(Wm%*%gcoef / 3))
A <- rbinom(N, 1, prob = g)
sigma <- 1
epsilon <-rnorm(N,0,sigma)
Y  <- beta0 + tau * A + epsilon
# ctmleGlmnet must provide user-specified Q
W_tmp <- data.frame(Wm[,1:3])
treated<- W_tmp[which(A==1),]
untreated<-W_tmp[which(A==0),]
Y1<-Y[which(A==1)]
Y0<-Y[which(A==0)]
# Initial Q-estimate
beta1hat <- predict(lm(Y1~.,data=treated),newdata=W_tmp)
beta0hat <- predict(lm(Y0~., data=untreated),newdata=W_tmp)
Q <- matrix(c(beta0hat,beta1hat),ncol=2)
W = Wm
glmnet_fit <- cv.glmnet(y = A, x = Wm,
                       family = 'binomial', nlambda = 40)
start = which(glmnet_fit$lambda==glmnet_fit$lambda.min))
end = length(glmnet_fit$lambda)
lambdas <-glmnet_fit$lambda[start:end]
ctmle_fit1 <- ctmleGlmnet(Y=Y, A=A,
                         W=data.frame(W=W),
                         Q = Q, lambdas = lambdas,
                         ctmletype=1, alpha=.995,
                         family="gaussian",
                         gbound=0.025,like_type="loglik" ,
                         fluctuation="logistic",
                         verbose=FALSE,
                         detailed=FALSE, PEN=FALSE,
                         V=5, stopFactor=10^6)

## End(Not run)

[Package ctmle version 0.1.2 Index]