linear2ph {sleev}R Documentation

Sieve maximum likelihood estimator (SMLE) for two-phase linear regression problems

Description

Performs efficient semiparametric estimation for general two-phase measurement error models when there are errors in both the outcome and covariates.

Usage

linear2ph(
  Y_unval = NULL,
  Y = NULL,
  X_unval = NULL,
  X = NULL,
  Z = NULL,
  Bspline = NULL,
  data = NULL,
  hn_scale = 1,
  noSE = FALSE,
  TOL = 1e-04,
  MAX_ITER = 1000,
  verbose = FALSE
)

Arguments

Y_unval

Column name of the error-prone or unvalidated continuous outcome. Subjects with missing values of Y_unval are omitted from the analysis. This argument is required.

Y

Column name that stores the validated value of Y_unval in the second phase. Subjects with missing values of Y are considered as those not selected in the second phase. This argument is required.

X_unval

Specifies the columns of the error-prone covariates. Subjects with missing values of X_unval are omitted from the analysis. This argument is required.

X

Specifies the columns that store the validated values of X_unval in the second phase. Subjects with missing values of X are considered as those not selected in the second phase. This argument is required.

Z

Specifies the columns of the accurately measured covariates. Subjects with missing values of Z are omitted from the analysis. This argument is optional.

Bspline

Specifies the columns of the B-spline basis. Subjects with missing values of Bspline are omitted from the analysis. This argument is required.

data

Specifies the name of the dataset. This argument is required.

hn_scale

Specifies the scale of the perturbation constant in the variance estimation. For example, if hn_scale = 0.5, then the perturbation constant is 0.5n^{-1/2}, where n is the first-phase sample size. The default value is 1. This argument is optional.

noSE

If TRUE, then the variances of the parameter estimators will not be estimated. The default value is FALSE. This argument is optional.

TOL

Specifies the convergence criterion in the EM algorithm. The default value is 1E-4. This argument is optional.

MAX_ITER

Maximum number of iterations in the EM algorithm. The default number is 1000. This argument is optional.

verbose

If TRUE, then show details of the analysis. The default value is FALSE.

Value

coefficients

Stores the analysis results.

sigma

Stores the residual standard error.

covariance

Stores the covariance matrix of the regression coefficient estimates.

converge

In parameter estimation, if the EM algorithm converges, then converge = TRUE. Otherwise, converge = FALSE.

converge_cov

In variance estimation, if the EM algorithm converges, then converge_cov = TRUE. Otherwise, converge_cov = FALSE.

References

Tao, R., Mercaldo, N. D., Haneuse, S., Maronge, J. M., Rathouz, P. J., Heagerty, P. J., & Schildcrout, J. S. (2021). Two-wave two-phase outcome-dependent sampling designs, with applications to longitudinal binary data. Statistics in Medicine, 40(8), 1863–1876. https://doi.org/10.1002/sim.8876

See Also

cv_linear2ph() to calculate the average predicted log likelihood of this function.

Examples

 rho = -.3
 p = 0.3
 hn_scale = 1
 nsieve = 20

 n = 100
 n2 = 40
 alpha = 0.3
 beta = 0.4
 set.seed(12345)

 ### generate data
 simX = rnorm(n)
 epsilon = rnorm(n)
 simY = alpha+beta*simX+epsilon
 error = MASS::mvrnorm(n, mu=c(0,0), Sigma=matrix(c(1, rho, rho, 1), nrow=2))
 
 simS = rbinom(n, 1, p)
 simU = simS*error[,2]
 simW = simS*error[,1]
 simY_tilde = simY+simW
 simX_tilde = simX+simU
 
 id_phase2 = sample(n, n2)
 
 simY[-id_phase2] = NA
 simX[-id_phase2] = NA
 
 # # histogram basis
 # Bspline = matrix(NA, nrow=n, ncol=nsieve)
 # cut_x_tilde = cut(simX_tilde, breaks=quantile(simX_tilde, probs=seq(0, 1, 1/nsieve)), 
 #   include.lowest = TRUE)
 # for (i in 1:nsieve) {
 #     Bspline[,i] = as.numeric(cut_x_tilde == names(table(cut_x_tilde))[i])
 # }
 # colnames(Bspline) = paste("bs", 1:nsieve, sep="")
 # # histogram basis
 
 # # linear basis
 # Bspline = splines::bs(simX_tilde, df=nsieve, degree=1,
 #   Boundary.knots=range(simX_tilde), intercept=TRUE)
 # colnames(Bspline) = paste("bs", 1:nsieve, sep="")
 # # linear basis
 
 # # quadratic basis
 # Bspline = splines::bs(simX_tilde, df=nsieve, degree=2, 
 #   Boundary.knots=range(simX_tilde), intercept=TRUE)
 # colnames(Bspline) = paste("bs", 1:nsieve, sep="")
 # # quadratic basis
 
 # cubic basis
 Bspline = splines::bs(simX_tilde, df=nsieve, degree=3, 
   Boundary.knots=range(simX_tilde), intercept=TRUE)
 colnames(Bspline) = paste("bs", 1:nsieve, sep="")
 # cubic basis
 
 data = data.frame(Y_tilde=simY_tilde, X_tilde=simX_tilde, Y=simY, X=simX, Bspline)

 res = linear2ph(Y="Y", X="X", Y_unval="Y_tilde", X_unval="X_tilde", 
   Bspline=colnames(Bspline), data=data, hn_scale=0.1)

[Package sleev version 1.0.3 Index]