ctmleGeneral {ctmle} | R Documentation |
General Template for Collaborative Targeted Maximum Likelihood Estimation
Description
This function computes the Collaborative Targeted Maximum Likelihood Estimator.
Usage
ctmleGeneral(Y, A, W, Wg = W, Q, ctmletype, gn_candidates,
gn_candidates_cv = NULL, alpha = 0.995, family = "gaussian",
gbound = 0.025, like_type = "RSS", fluctuation = "logistic",
verbose = FALSE, detailed = FALSE, PEN = FALSE, g1W = NULL,
g1WPrev = NULL, V = 5, folds = NULL, stopFactor = 10^6)
Arguments
Y |
continuous or binary outcome variable |
A |
binary treatment indicator, 1 for treatment, 0 for control |
W |
vector, matrix, or dataframe containing baseline covariates for Q bar |
Wg |
vector, matrix, or dataframe containing baseline covariates for propensity score model (defaults to W if not supplied by user) |
Q |
n by 2 matrix of initial values for Q0W, Q1W in columns 1 and 2, respectively. Current version does not support SL for automatic initial estimation of Q bar |
ctmletype |
1 or 3. Type of general C-TMLE. Type 1 uses cross-validation to select best gn, while Type 3 directly solves extra clever covariates. |
gn_candidates |
matrix or dataframe, each column stand for a estimate of propensity score. Estimate in the column with larger index should have smaller empirical loss |
gn_candidates_cv |
matrix or dataframe, each column stand for a the cross-validated estimate. For example, the (i,j)-th element is the predicted propensity score by j-th estimator, for i-th observation, when it is in the validation set |
alpha |
used to keep predicted initial values bounded away from (0,1) for logistic fluctuation, 0.995 (default) |
family |
family specification for working regression models, generally 'gaussian' for continuous outcomes (default), 'binomial' for binary outcomes |
gbound |
bound on P(A=1|W), defaults to 0.025 |
like_type |
'RSS' or 'loglike'. The metric to use for forward selection and cross-validation |
fluctuation |
'logistic' (default) or 'linear', for targeting step |
verbose |
print status messages if TRUE |
detailed |
boolean number. If it is TRUE, return more detailed results |
PEN |
boolean. If true, penalized loss is used in cross-validation step |
g1W |
Only used when type is 3. a user-supplied propensity score estimate. |
g1WPrev |
Only used when type is 3. a user-supplied propensity score estimate, with small fluctuation compared to g1W. |
V |
Number of folds. Only used if folds is not specified |
folds |
The list of indices for cross-validation step. We recommend the cv-splits in C-TMLE matchs that in gn_candidate_cv |
stopFactor |
Numerical value with default 1e6. If the current empirical likelihood is stopFactor times larger than the best previous one, the construction would stop |
Value
best_k the index of estimate that selected by cross-validation
est estimate of psi_0
CI IC-based 95
pvalue pvalue for the null hypothesis that Psi = 0
likelihood sum of squared residuals, based on selected estimator evaluated on all obs or, logistic loglikelihood if like_type != "RSS"
varIC empirical variance of the influence curve adjusted for estimation of g
varDstar empirical variance of the influence curve
var.psi variance of the estimate
varIC.cv cross-validated variance of the influence curve
penlikelihood.cv penalized cross-validatedlikelihood
cv.res all cross-validation results for each fold
Examples
N <- 1000
p = 100
V = 5
Wmat <- matrix(rnorm(N * p), ncol = p)
gcoef <- matrix(c(-1,-1,rep(-(3/((p)-2)),(p)-2)),ncol=1)
W <- as.data.frame(Wmat)
g <- 1/(1+exp(Wmat%*%gcoef / 3))
A <- rbinom(N, 1, prob = g)
# Build potential outcome pairs, and the observed outcome Y
beta1 <- 4+2*Wmat[,1]+2*Wmat[,2]+2*Wmat[,5]+2*Wmat[,6]+2*Wmat[,8]
beta0 <- 2+2*Wmat[,1]+2*Wmat[,2]+2*Wmat[,5]+2*Wmat[,6]+2*Wmat[,8]
tau = 2
sigma <- 1
epsilon <-rnorm(N,0,sigma)
Y <- beta0 + tau * A + epsilon
# Initial estimate of Q
Q <- cbind(rep(mean(Y[A == 1]), N), rep(mean(Y[A == 0]), N))
folds <-by(sample(1:N,N), rep(1:V, length=N), list)
lasso_fit <- cv.glmnet(x = as.matrix(W), y = A, alpha = 1, nlambda = 100, nfolds = 10)
lasso_lambdas <- lasso_fit$lambda[lasso_fit$lambda <= lasso_fit$lambda.min][1:5]
# Build template for glmnet
SL.glmnet_new <- function (Y, X, newX, family, obsWeights, id, alpha = 1,
nlambda = 100, lambda = 0,...)
{
# browser()
if (!is.matrix(X)) {
X <- model.matrix(~-1 + ., X)
newX <- model.matrix(~-1 + ., newX)
}
fit <- glmnet::glmnet(x = X, y = Y,
lambda = lambda,
family = family$family, alpha = alpha)
pred <- predict(fit, newx = newX, type = "response")
fit <- list(object = fit)
class(fit) <- "SL.glmnet"
out <- list(pred = pred, fit = fit)
return(out)
}
# Use a sequence of LASSO estimators to build gn sequence:
SL.cv1lasso <- function (... , alpha = 1, lambda = lasso_lambdas[1]){
SL.glmnet_new(... , alpha = alpha, lambda = lambda)
}
SL.cv2lasso <- function (... , alpha = 1, lambda = lasso_lambdas[2]){
SL.glmnet_new(... , alpha = alpha, lambda = lambda)
}
SL.cv3lasso <- function (... , alpha = 1, lambda = lasso_lambdas[3]){
SL.glmnet_new(... , alpha = alpha, lambda = lambda)
}
SL.cv4lasso <- function (... , alpha = 1, lambda = lasso_lambdas[4]){
SL.glmnet_new(... , alpha = alpha, lambda = lambda)
}
SL.library = c('SL.cv1lasso', 'SL.cv2lasso', 'SL.cv3lasso', 'SL.cv4lasso', 'SL.glm')
# Build the sequence. See more details in the function build_gn_seq, and package SuperLearner
gn_seq <- build_gn_seq(A = A, W = W, SL.library = SL.library, folds = folds)
# Use the output of build_gn_seq for ctmle general templates
ctmle_fit <- ctmleGeneral(Y = Y, A = A, W = W, Q = Q, ctmletype = 1,
gn_candidates = gn_seq$gn_candidates,
gn_candidates_cv = gn_seq$gn_candidates_cv,
folds = gn_seq$folds, V = length(folds))