R: Collaborative Targeted Maximum Likelihood Estimation for...

ctmleGlmnet {ctmle}

R Documentation

Collaborative Targeted Maximum Likelihood Estimation for hyper-parameter tuning of LASSO

Description

This function computes the Collaborative Maximum Likelihood Estimation for hyper-parameter tuning of LASSO.

Usage

ctmleGlmnet(Y, A, W, Wg = W, Q, lambdas = NULL, ctmletype, V = 5,
  folds = NULL, alpha = 0.995, family = "gaussian", gbound = 0.025,
  like_type = "RSS", fluctuation = "logistic", verbose = FALSE,
  detailed = FALSE, PEN = FALSE, g1W = NULL, g1WPrev = NULL,
  stopFactor = 10^6)

Arguments

`Y`	continuous or binary outcome variable
`A`	binary treatment indicator, 1 for treatment, 0 for control
`W`	vector, matrix, or dataframe containing baseline covariates for Q bar
`Wg`	vector, matrix, or dataframe containing baseline covariates for propensity score model (defaults to W if not supplied by user)
`Q`	n by 2 matrix of initial values for Q0W, Q1W in columns 1 and 2, respectively. Current version does not support SL for automatic initial estimation of Q bar
`lambdas`	numeric vector of lambdas (regularization parameter) for glmnet estimation of propensity score, with decreasing order. We recommend the first lambda is selected by external cross-validation.
`ctmletype`	1, 2 or 3. Type of general C-TMLE. Type 1 uses cross-validation to select best gn, Type 3 directly solves extra clever covariates, and Type 2 uses both cross-validation and extra covariate. See more details in !!!
`V`	Number of folds. Only used if folds is not specified
`folds`	The list of indices for cross-validation step. We recommend the cv-splits in C-TMLE matchs that in gn_candidate_cv
`alpha`	used to keep predicted initial values bounded away from (0,1) for logistic fluctuation, 0.995 (default)
`family`	family specification for working regression models, generally 'gaussian' for continuous outcomes (default), 'binomial' for binary outcomes
`gbound`	bound on P(A=1\|W), defaults to 0.025
`like_type`	'RSS' or 'loglike'. The metric to use for forward selection and cross-validation
`fluctuation`	'logistic' (default) or 'linear', for targeting step
`verbose`	print status messages if TRUE
`detailed`	boolean number. If it is TRUE, return more detailed results
`PEN`	boolean. If true, penalized loss is used in cross-validation step
`g1W`	Only used when type is 3. a user-supplied propensity score estimate.
`g1WPrev`	Only used when type is 3. a user-supplied propensity score estimate, with small fluctuation compared to g1W.
`stopFactor`	Numerical value with default 1e6. If the current empirical likelihood is stopFactor times larger than the best previous one, the construction would stop

Value

best_k the index of estimate that selected by cross-validation

est estimate of psi_0

CI IC-based 95

pvalue pvalue for the null hypothesis that Psi = 0

likelihood sum of squared residuals, based on selected estimator evaluated on all obs or, logistic loglikelihood if like_type != 'RSS'

varIC empirical variance of the influence curve adjusted for estimation of g

varDstar empirical variance of the influence curve

var.psi variance of the estimate

varIC.cv cross-validated variance of the influence curve

penlikelihood.cv penalized cross-validatedlikelihood

cv.res all cross-validation results for each fold

Examples

## Not run: 
set.seed(123)
N <- 1000
p = 10
Wmat <- matrix(rnorm(N * p), ncol = p)
beta1 <- 4+2*Wmat[,1]+2*Wmat[,2]+2*Wmat[,5]+2*Wmat[,6]+2*Wmat[,8]
beta0 <- 2+2*Wmat[,1]+2*Wmat[,2]+2*Wmat[,5]+2*Wmat[,6]+2*Wmat[,8]
tau <- 2
gcoef <- matrix(c(-1,-1,rep(0,(p)-2)),ncol=1)
Wm <- as.matrix(Wmat)
g <- 1/(1+exp(Wm%*%gcoef / 3))
A <- rbinom(N, 1, prob = g)
sigma <- 1
epsilon <-rnorm(N,0,sigma)
Y  <- beta0 + tau * A + epsilon
# ctmleGlmnet must provide user-specified Q
W_tmp <- data.frame(Wm[,1:3])
treated<- W_tmp[which(A==1),]
untreated<-W_tmp[which(A==0),]
Y1<-Y[which(A==1)]
Y0<-Y[which(A==0)]
# Initial Q-estimate
beta1hat <- predict(lm(Y1~.,data=treated),newdata=W_tmp)
beta0hat <- predict(lm(Y0~., data=untreated),newdata=W_tmp)
Q <- matrix(c(beta0hat,beta1hat),ncol=2)
W = Wm
glmnet_fit <- cv.glmnet(y = A, x = Wm,
                       family = 'binomial', nlambda = 40)
start = which(glmnet_fit$lambda==glmnet_fit$lambda.min))
end = length(glmnet_fit$lambda)
lambdas <-glmnet_fit$lambda[start:end]
ctmle_fit1 <- ctmleGlmnet(Y=Y, A=A,
                         W=data.frame(W=W),
                         Q = Q, lambdas = lambdas,
                         ctmletype=1, alpha=.995,
                         family="gaussian",
                         gbound=0.025,like_type="loglik" ,
                         fluctuation="logistic",
                         verbose=FALSE,
                         detailed=FALSE, PEN=FALSE,
                         V=5, stopFactor=10^6)

## End(Not run)

[Package ctmle version 0.1.2 Index]