R: Sieve maximum likelihood estimator (SMLE) for two-phase...

linear2ph {sleev}

R Documentation

Sieve maximum likelihood estimator (SMLE) for two-phase linear regression problems

Description

Performs efficient semiparametric estimation for general two-phase measurement error models when there are errors in both the outcome and covariates.

Usage

linear2ph(
  Y_unval = NULL,
  Y = NULL,
  X_unval = NULL,
  X = NULL,
  Z = NULL,
  Bspline = NULL,
  data = NULL,
  hn_scale = 1,
  noSE = FALSE,
  TOL = 1e-04,
  MAX_ITER = 1000,
  verbose = FALSE
)

Arguments

`Y_unval`	Column name of the error-prone or unvalidated continuous outcome. Subjects with missing values of `Y_unval` are omitted from the analysis. This argument is required.
`Y`	Column name that stores the validated value of `Y_unval` in the second phase. Subjects with missing values of `Y` are considered as those not selected in the second phase. This argument is required.
`X_unval`	Specifies the columns of the error-prone covariates. Subjects with missing values of `X_unval` are omitted from the analysis. This argument is required.
`X`	Specifies the columns that store the validated values of `X_unval` in the second phase. Subjects with missing values of `X` are considered as those not selected in the second phase. This argument is required.
`Z`	Specifies the columns of the accurately measured covariates. Subjects with missing values of `Z` are omitted from the analysis. This argument is optional.
`Bspline`	Specifies the columns of the B-spline basis. Subjects with missing values of `Bspline` are omitted from the analysis. This argument is required.
`data`	Specifies the name of the dataset. This argument is required.
`hn_scale`	Specifies the scale of the perturbation constant in the variance estimation. For example, if `hn_scale = 0.5`, then the perturbation constant is `0.5n^{-1/2}`, where `n` is the first-phase sample size. The default value is `1`. This argument is optional.
`noSE`	If `TRUE`, then the variances of the parameter estimators will not be estimated. The default value is `FALSE`. This argument is optional.
`TOL`	Specifies the convergence criterion in the EM algorithm. The default value is `1E-4`. This argument is optional.
`MAX_ITER`	Maximum number of iterations in the EM algorithm. The default number is `1000`. This argument is optional.
`verbose`	If `TRUE`, then show details of the analysis. The default value is `FALSE`.

Value

`coefficients`	Stores the analysis results.
`sigma`	Stores the residual standard error.
`covariance`	Stores the covariance matrix of the regression coefficient estimates.
`converge`	In parameter estimation, if the EM algorithm converges, then `converge = TRUE`. Otherwise, `converge = FALSE`.
`converge_cov`	In variance estimation, if the EM algorithm converges, then `converge_cov = TRUE`. Otherwise, `converge_cov = FALSE`.

References

Tao, R., Mercaldo, N. D., Haneuse, S., Maronge, J. M., Rathouz, P. J., Heagerty, P. J., & Schildcrout, J. S. (2021). Two-wave two-phase outcome-dependent sampling designs, with applications to longitudinal binary data. Statistics in Medicine, 40(8), 1863–1876. https://doi.org/10.1002/sim.8876

Examples

 rho = -.3
 p = 0.3
 hn_scale = 1
 nsieve = 20

 n = 100
 n2 = 40
 alpha = 0.3
 beta = 0.4
 set.seed(12345)

 ### generate data
 simX = rnorm(n)
 epsilon = rnorm(n)
 simY = alpha+beta*simX+epsilon
 error = MASS::mvrnorm(n, mu=c(0,0), Sigma=matrix(c(1, rho, rho, 1), nrow=2))
 
 simS = rbinom(n, 1, p)
 simU = simS*error[,2]
 simW = simS*error[,1]
 simY_tilde = simY+simW
 simX_tilde = simX+simU
 
 id_phase2 = sample(n, n2)
 
 simY[-id_phase2] = NA
 simX[-id_phase2] = NA
 
 # # histogram basis
 # Bspline = matrix(NA, nrow=n, ncol=nsieve)
 # cut_x_tilde = cut(simX_tilde, breaks=quantile(simX_tilde, probs=seq(0, 1, 1/nsieve)), 
 #   include.lowest = TRUE)
 # for (i in 1:nsieve) {
 #     Bspline[,i] = as.numeric(cut_x_tilde == names(table(cut_x_tilde))[i])
 # }
 # colnames(Bspline) = paste("bs", 1:nsieve, sep="")
 # # histogram basis
 
 # # linear basis
 # Bspline = splines::bs(simX_tilde, df=nsieve, degree=1,
 #   Boundary.knots=range(simX_tilde), intercept=TRUE)
 # colnames(Bspline) = paste("bs", 1:nsieve, sep="")
 # # linear basis
 
 # # quadratic basis
 # Bspline = splines::bs(simX_tilde, df=nsieve, degree=2, 
 #   Boundary.knots=range(simX_tilde), intercept=TRUE)
 # colnames(Bspline) = paste("bs", 1:nsieve, sep="")
 # # quadratic basis
 
 # cubic basis
 Bspline = splines::bs(simX_tilde, df=nsieve, degree=3, 
   Boundary.knots=range(simX_tilde), intercept=TRUE)
 colnames(Bspline) = paste("bs", 1:nsieve, sep="")
 # cubic basis
 
 data = data.frame(Y_tilde=simY_tilde, X_tilde=simX_tilde, Y=simY, X=simX, Bspline)

 res = linear2ph(Y="Y", X="X", Y_unval="Y_tilde", X_unval="X_tilde", 
   Bspline=colnames(Bspline), data=data, hn_scale=0.1)

[Package sleev version 1.0.3 Index]