fxnCC_LogReg {MetaIntegration} | R Documentation |
Constraint maximum likelihood (CML) method for logistic regression (binary outcome Y)
Description
Constraint maximum likelihood (CML) method for logistic regression (binary outcome Y)
Usage
fxnCC_LogReg(
p,
q,
YInt,
XInt,
BInt,
betaHatExt,
gammaHatInt,
n,
tol,
maxIter,
factor
)
Arguments
p |
total number of X covariates including the intercept (i.e. p=ncol(X)+1) |
q |
total number of covariates including the intercept (i.e. q=ncol(X)+ncol(B)+1) |
YInt |
Outcome vector |
XInt |
X covariates that are used in the external models - Do not include intercept |
BInt |
Newly added B covariates that are not included in the external models |
betaHatExt |
External parameter estimates of the reduced model |
gammaHatInt |
Full model parameter estimates using the internal data only |
n |
internal data sample size |
tol |
convergence criteria e.g. 1e-6 |
maxIter |
Maximum number of iterations to reach convergence e.g. 400 |
factor |
the step-halving factor between 0 and 1, if factor=1 then newton-raphson method; decrease if algorithm cannot converge given the maximum iterations |
Value
gammaHat, in the order (intercept, XInt, BInt)
References
Chatterjee, N., Chen, Y.-H., P.Maas and Carroll, R. J. (2016). Constrained maximum likelihood estimation for model calibration using summary-level information from external big data sources. Journal of the American Statistical Association 111, 107-117.
Examples
# Full model: Y|X, B
# Reduced model: Y|X
# X,B follows normal distribution with mean zero, variance one and correlation 0.3
# Y|X, B follows N(-1-0.5X+0.5B, 1)
set.seed(2333)
n = 1000
data.n = data.frame(matrix(ncol = 3, nrow = n))
colnames(data.n) = c('Y', 'X', 'B')
data.n[,c('X', 'B')] = MASS::mvrnorm(n, rep(0,2), diag(0.7,2)+0.3)
data.n$Y = rbinom(n, 1, expit(-1 - 0.5*data.n$X + 0.5*data.n$B))
# Generate the beta estimates from the external reduced model:
# generate a data of size m from the full model first, then fit the reduced regression
# to obtain the beta estiamtes and the corresponsing estimated variance
m = 30000
data.m = data.frame(matrix(ncol = 3, nrow = m))
names(data.m) = c('Y', 'X', 'B')
data.m[,c('X', 'B')] = MASS::mvrnorm(m, rep(0,2), diag(0.7,2)+0.3)
data.m$Y = rbinom(m, 1, expit(-1 - 0.5*data.m$X + 0.5*data.m$B))
#fit Y|X to obtain the external beta estimates, save the beta estiamtes and the
# corresponsing estimated variance
fit.E = glm(Y ~ X, data = data.m, family = binomial(link='logit'))
beta.E = coef(fit.E)
names(beta.E) = c('int', 'X')
V.E = vcov(fit.E)
#get full model estimate from direct regression using the internal data only
fit.gamma.I = glm(Y ~ X + B, data = data.n, family = binomial(link='logit'))
gamma.I = coef(fit.gamma.I)
#Get CML estimates
gamma.CML = fxnCC_LogReg(p=2,
q=3,
YInt=data.n$Y,
XInt=data.n$X,
BInt=data.n[,'B'],
betaHatExt=beta.E,
gammaHatInt=gamma.I,
n=nrow(data.n),
tol=1e-8,
maxIter=400,
factor=1)[["gammaHat"]]