linear2ph {sleev} | R Documentation |
Sieve maximum likelihood estimator (SMLE) for two-phase linear regression problems
Description
Performs efficient semiparametric estimation for general two-phase measurement error models when there are errors in both the outcome and covariates.
Usage
linear2ph(
Y_unval = NULL,
Y = NULL,
X_unval = NULL,
X = NULL,
Z = NULL,
Bspline = NULL,
data = NULL,
hn_scale = 1,
noSE = FALSE,
TOL = 1e-04,
MAX_ITER = 1000,
verbose = FALSE
)
Arguments
Y_unval |
Column name of the error-prone or unvalidated continuous outcome. Subjects with missing values of |
Y |
Column name that stores the validated value of |
X_unval |
Specifies the columns of the error-prone covariates. Subjects with missing values of |
X |
Specifies the columns that store the validated values of |
Z |
Specifies the columns of the accurately measured covariates. Subjects with missing values of |
Bspline |
Specifies the columns of the B-spline basis. Subjects with missing values of |
data |
Specifies the name of the dataset. This argument is required. |
hn_scale |
Specifies the scale of the perturbation constant in the variance estimation. For example, if |
noSE |
If |
TOL |
Specifies the convergence criterion in the EM algorithm. The default value is |
MAX_ITER |
Maximum number of iterations in the EM algorithm. The default number is |
verbose |
If |
Value
coefficients |
Stores the analysis results. |
sigma |
Stores the residual standard error. |
covariance |
Stores the covariance matrix of the regression coefficient estimates. |
converge |
In parameter estimation, if the EM algorithm converges, then |
converge_cov |
In variance estimation, if the EM algorithm converges, then |
References
Tao, R., Mercaldo, N. D., Haneuse, S., Maronge, J. M., Rathouz, P. J., Heagerty, P. J., & Schildcrout, J. S. (2021). Two-wave two-phase outcome-dependent sampling designs, with applications to longitudinal binary data. Statistics in Medicine, 40(8), 1863–1876. https://doi.org/10.1002/sim.8876
See Also
cv_linear2ph()
to calculate the average predicted log likelihood of this function.
Examples
rho = -.3
p = 0.3
hn_scale = 1
nsieve = 20
n = 100
n2 = 40
alpha = 0.3
beta = 0.4
set.seed(12345)
### generate data
simX = rnorm(n)
epsilon = rnorm(n)
simY = alpha+beta*simX+epsilon
error = MASS::mvrnorm(n, mu=c(0,0), Sigma=matrix(c(1, rho, rho, 1), nrow=2))
simS = rbinom(n, 1, p)
simU = simS*error[,2]
simW = simS*error[,1]
simY_tilde = simY+simW
simX_tilde = simX+simU
id_phase2 = sample(n, n2)
simY[-id_phase2] = NA
simX[-id_phase2] = NA
# # histogram basis
# Bspline = matrix(NA, nrow=n, ncol=nsieve)
# cut_x_tilde = cut(simX_tilde, breaks=quantile(simX_tilde, probs=seq(0, 1, 1/nsieve)),
# include.lowest = TRUE)
# for (i in 1:nsieve) {
# Bspline[,i] = as.numeric(cut_x_tilde == names(table(cut_x_tilde))[i])
# }
# colnames(Bspline) = paste("bs", 1:nsieve, sep="")
# # histogram basis
# # linear basis
# Bspline = splines::bs(simX_tilde, df=nsieve, degree=1,
# Boundary.knots=range(simX_tilde), intercept=TRUE)
# colnames(Bspline) = paste("bs", 1:nsieve, sep="")
# # linear basis
# # quadratic basis
# Bspline = splines::bs(simX_tilde, df=nsieve, degree=2,
# Boundary.knots=range(simX_tilde), intercept=TRUE)
# colnames(Bspline) = paste("bs", 1:nsieve, sep="")
# # quadratic basis
# cubic basis
Bspline = splines::bs(simX_tilde, df=nsieve, degree=3,
Boundary.knots=range(simX_tilde), intercept=TRUE)
colnames(Bspline) = paste("bs", 1:nsieve, sep="")
# cubic basis
data = data.frame(Y_tilde=simY_tilde, X_tilde=simX_tilde, Y=simY, X=simX, Bspline)
res = linear2ph(Y="Y", X="X", Y_unval="Y_tilde", X_unval="X_tilde",
Bspline=colnames(Bspline), data=data, hn_scale=0.1)