asympVar_LinReg {MetaIntegration} | R Documentation |
Asymptotic variance-covariance matrix for gamma_Int and gamma_CML for linear regression (continuous outcome Y)
Description
Asymptotic variance-covariance matrix for gamma_Int and gamma_CML for linear regression (continuous outcome Y)
Usage
asympVar_LinReg(
k,
p,
q,
YInt,
XInt,
BInt,
gammaHatInt,
betaHatExt_list,
CovExt_list,
rho,
ExUncertainty
)
Arguments
k |
number of external models |
p |
total number of X covariates including the intercept (i.e. p=ncol(X)+1) |
q |
total number of covariates including the intercept (i.e. q=ncol(X)+ncol(B)+1) |
YInt |
Outcome vector |
XInt |
X covariates that are used in the external models - Do not include intercept |
BInt |
Newly added B covariates that are not included in the external models |
gammaHatInt |
Internal parameter estimates of the full model using the internal data |
betaHatExt_list |
a list of k items, each item is a vector of the external parameter estimates (beta). Vector name is required for each covariate, and has to be as consistent as the full model |
CovExt_list |
a list of k items, each item is the variance-covariance matrix of the external parameter estimates (beta) of the reduced model |
rho |
a list of k items, each item is the sample size ratio, n/m (the internal sampel size n over the external sample size m) |
ExUncertainty |
logic indicator, if TRUE then considering the external model uncertainty in the algorithm; if FALSE then ignoring the external model uncertainty |
Value
a list containing:
"asyV.I" Variance of gamma_I (the direct regression parameter estimates using the internal data only)
"asyV.CML" Variance of gamma_CML (the CML estiamtes (Chatterjee et al. 2016))
"asyCov.CML" Covariance between two different CML estimates, gamma_CMLi and gamma_CMLj
"asyCov.CML.I" Covariance between gamma_I and gamma_CML
"ExtraTerm" the extra variance when ExUncertainty == TRUE (i.e. the external uncertainty is considered in the algorithm)
References
Chatterjee, N., Chen, Y.-H., P.Maas and Carroll, R. J. (2016). Constrained maximum likelihood estimation for model calibration using summary-level information from external big data sources. Journal of the American Statistical Association 111, 107-117.
Gu, T., Taylor, J.M.G. and Mukherjee, B. (2020). An ensemble meta-prediction framework to integrate multiple regression models into a current study. Manuscript in preparation.
Examples
# Full model: Y|X1, X2, B
# Reduced model 1: Y|X1 of sample size m1
# Reduced model 2: Y|X2 of sample size m2
# (X1, X2, B) follows normal distribution with mean zero, variance one and correlation 0.3
# Y|X1, X2, B follows N(-1-0.5*X1-0.5*X2+0.5*B, 1)
set.seed(2333)
n = 1000
data.n = data.frame(matrix(ncol = 4, nrow = n))
colnames(data.n) = c('Y', 'X1', 'X2', 'B')
data.n[,c('X1', 'X2', 'B')] = MASS::mvrnorm(n, rep(0,3), diag(0.7,3)+0.3)
data.n$Y = rnorm(n, -1 - 0.5*data.n$X1 - 0.5*data.n$X2 + 0.5*data.n$B, 1)
# Generate the beta estimates from the external reduced model:
# generate a data of size m from the full model first, then fit the reduced regression
# to obtain the beta estiamtes and the corresponsing estimated variance
m = m1 = m2 = 30000
data.m = data.frame(matrix(ncol = 4, nrow = m))
names(data.m) = c('Y', 'X1', 'X2', 'B')
data.m[,c('X1', 'X2', 'B')] = MASS::mvrnorm(m, rep(0,3), diag(0.7,3)+0.3)
data.m$Y = rnorm(m, -1 - 0.5*data.m$X1 - 0.5*data.m$X2 + 0.5*data.m$B, 1)
#fit Y|X to obtain the external beta estimates, save the beta estiamtes and the
# corresponding estimated variance
fit.E1 = lm(Y ~ X1, data = data.m)
fit.E2 = lm(Y ~ X2, data = data.m)
beta.E1 = coef(fit.E1)
beta.E2 = coef(fit.E2)
names(beta.E1) = c('int', 'X1')
names(beta.E2) = c('int', 'X2')
V.E1 = vcov(fit.E1)
V.E2 = vcov(fit.E2)
#Save all the external model information into lists for later use
betaHatExt_list = list(Ext1 = beta.E1, Ext2 = beta.E2)
CovExt_list = list(Ext1 = V.E1, Ext2 = V.E2)
rho = list(Ext1 = n/m1, Ext2 = n/m2)
#get full model estimate from direct regression using the internal data only
fit.gamma.I = lm(Y ~ X1 + X2 + B, data = data.n)
gamma.I = coef(fit.gamma.I)
#Get CML estimates using internal data and the beta estimates from the external
# model 1 and 2, respectively
gamma.CML1 = fxnCC_LinReg(p=2, q=4, YInt=data.n$Y, XInt=data.n$X1,
BInt=cbind(data.n$X2, data.n$B), betaHatExt=beta.E1,
gammaHatInt=gamma.I, n=nrow(data.n), tol=1e-8,
maxIter=400,factor=1)[["gammaHat"]]
gamma.CML2 = fxnCC_LinReg(p=2, q=4, YInt=data.n$Y, XInt=data.n$X2,
BInt=cbind(data.n$X1, data.n$B), betaHatExt=beta.E2,
gammaHatInt=gamma.I, n=nrow(data.n), tol=1e-8,
maxIter=400, factor=1)[["gammaHat"]]
#It's important to reorder gamma.CML2 so that it follows the order (X1, X2, X3, B)
# as gamma.I and gamma.CML1
gamma.CML2 = c(gamma.CML2[1], gamma.CML2[3], gamma.CML2[2], gamma.CML2[4])
#Get Variance-covariance matricx of c(gamma.I, gamma.CML1, gamma.CML2)
asy.CML = asympVar_LinReg(k=2,
p=2,
q=4,
YInt=data.n$Y,
XInt=data.n[,c('X1','X2')],
#covariates that appeared in at least one external model
BInt=data.n$B, #covariates that not used in any of the external models
gammaHatInt=gamma.I,
betaHatExt_list=betaHatExt_list,
CovExt_list=CovExt_list,
rho=rho,
ExUncertainty=TRUE)
asyV.I = asy.CML[["asyV.I"]] #variance of gamma.I
asyV.CML1 = asy.CML[["asyV.CML"]][[1]] #variance of gamma.CML1
asyV.CML2 = asy.CML[["asyV.CML"]][[2]] #variance of gamma.CML2
asyCov.CML1.I = asy.CML[["asyCov.CML.I"]][[1]] #covariance of gamma.CML1 and gamma.I
asyCov.CML2.I = asy.CML[["asyCov.CML.I"]][[2]] #covariance of gamma.CML2 and gamma.I
asyCov.CML12 = asy.CML[["asyCov.CML"]][["12"]] #covariance of gamma.CML1 and gamma.CML2