R: Constraint maximum likelihood (CML) method for logistic...

fxnCC_LogReg {MetaIntegration}

R Documentation

Constraint maximum likelihood (CML) method for logistic regression (binary outcome Y)

Description

Constraint maximum likelihood (CML) method for logistic regression (binary outcome Y)

Usage

fxnCC_LogReg(
  p,
  q,
  YInt,
  XInt,
  BInt,
  betaHatExt,
  gammaHatInt,
  n,
  tol,
  maxIter,
  factor
)

Arguments

`p`	total number of X covariates including the intercept (i.e. p=ncol(X)+1)
`q`	total number of covariates including the intercept (i.e. q=ncol(X)+ncol(B)+1)
`YInt`	Outcome vector
`XInt`	X covariates that are used in the external models - Do not include intercept
`BInt`	Newly added B covariates that are not included in the external models
`betaHatExt`	External parameter estimates of the reduced model
`gammaHatInt`	Full model parameter estimates using the internal data only
`n`	internal data sample size
`tol`	convergence criteria e.g. 1e-6
`maxIter`	Maximum number of iterations to reach convergence e.g. 400
`factor`	the step-halving factor between 0 and 1, if factor=1 then newton-raphson method; decrease if algorithm cannot converge given the maximum iterations

Value

gammaHat, in the order (intercept, XInt, BInt)

References

Chatterjee, N., Chen, Y.-H., P.Maas and Carroll, R. J. (2016). Constrained maximum likelihood estimation for model calibration using summary-level information from external big data sources. Journal of the American Statistical Association 111, 107-117.

Examples

# Full model: Y|X, B
# Reduced model: Y|X
# X,B follows normal distribution with mean zero, variance one and correlation 0.3
# Y|X, B follows N(-1-0.5X+0.5B, 1)
set.seed(2333)
n = 1000
data.n = data.frame(matrix(ncol = 3, nrow = n))
colnames(data.n) = c('Y', 'X', 'B')
data.n[,c('X', 'B')] = MASS::mvrnorm(n, rep(0,2), diag(0.7,2)+0.3)
data.n$Y = rbinom(n, 1, expit(-1 - 0.5*data.n$X + 0.5*data.n$B))

# Generate the beta estimates from the external reduced model:
# generate a data of size m from the full model first, then fit the reduced regression 
# to obtain the beta estiamtes and the corresponsing estimated variance 
m = 30000
data.m = data.frame(matrix(ncol = 3, nrow = m))
names(data.m) = c('Y', 'X', 'B')
data.m[,c('X', 'B')] = MASS::mvrnorm(m, rep(0,2), diag(0.7,2)+0.3)
data.m$Y = rbinom(m, 1, expit(-1 - 0.5*data.m$X + 0.5*data.m$B))

#fit Y|X to obtain the external beta estimates, save the beta estiamtes and the 
# corresponsing estimated variance 
fit.E = glm(Y ~ X, data = data.m, family = binomial(link='logit'))
beta.E = coef(fit.E)
names(beta.E) = c('int', 'X')
V.E = vcov(fit.E)

#get full model estimate from direct regression using the internal data only
fit.gamma.I = glm(Y ~ X + B, data = data.n, family = binomial(link='logit'))
gamma.I = coef(fit.gamma.I)

#Get CML estimates
gamma.CML = fxnCC_LogReg(p=2, 
                         q=3, 
                         YInt=data.n$Y, 
                         XInt=data.n$X, 
                         BInt=data.n[,'B'], 
                         betaHatExt=beta.E, 
                         gammaHatInt=gamma.I, 
                         n=nrow(data.n), 
                         tol=1e-8, 
                         maxIter=400,
                         factor=1)[["gammaHat"]]

[Package MetaIntegration version 0.1.2 Index]