Performs cross-validation to calculate the average predicted log likelihood for the linear2ph function. This function can be used to select the B-spline basis that yields the largest average predicted log likelihood.


  Y_unval = NULL,
  Y = NULL,
  X_unval = NULL,
  X = NULL,
  Z = NULL,
  Bspline = NULL,
  data = NULL,
  nfolds = 5,
  MAX_ITER = 2000,
  TOL = 1e-04,
  verbose = FALSE



Specifies the column of the error-prone outcome that is continuous. Subjects with missing values of Y_unval are omitted from the analysis. This argument is required.


Specifies the column that stores the validated value of Y_unval in the second phase. Subjects with missing values of Y are considered as those not selected in the second phase. This argument is required.


Specifies the columns of the error-prone covariates. Subjects with missing values of X_unval are omitted from the analysis. This argument is required.


Specifies the columns that store the validated values of X_unval in the second phase. Subjects with missing values of X are considered as those not selected in the second phase. This argument is required.


Specifies the columns of the accurately measured covariates. Subjects with missing values of Z are omitted from the analysis. This argument is optional.


Specifies the columns of the B-spline basis. Subjects with missing values of Bspline are omitted from the analysis. This argument is required.


Specifies the name of the dataset. This argument is required.


Specifies the number of cross-validation folds. The default value is 5. Although nfolds can be as large as the sample size (leave-one-out cross-validation), it is not recommended for large datasets. The smallest value allowable is 3.


Specifies the maximum number of iterations in the EM algorithm. The default number is 2000. This argument is optional.


Specifies the convergence criterion in the EM algorithm. The default value is 1E-4. This argument is optional.


If TRUE, then show details of the analysis. The default value is FALSE.



Stores the average predicted log likelihood.


Stores the predicted log likelihood in each fold.


Stores the convergence status of the EM algorithm in each run.


  rho = 0.3
  p = 0.3
  n = 100
  n2 = 40
  alpha = 0.3
  beta = 0.4
  ### generate data
  simX = rnorm(n)
  epsilon = rnorm(n)
  simY = alpha+beta*simX+epsilon
  error = MASS::mvrnorm(n, mu=c(0,0), Sigma=matrix(c(1, rho, rho, 1), nrow=2))
  simS = rbinom(n, 1, p)
  simU = simS*error[,2]
  simW = simS*error[,1]
  simY_tilde = simY+simW
  simX_tilde = simX+simU
  id_phase2 = sample(n, n2)
  simY[-id_phase2] = NA
  simX[-id_phase2] = NA
  # cubic basis
  nsieves = c(5, 10)
  pred_loglike = rep(NA, length(nsieves))
  for (i in 1:length(nsieves)) {
      nsieve = nsieves[i]
      Bspline = splines::bs(simX_tilde, df=nsieve, degree=3, 
        Boundary.knots=range(simX_tilde), intercept=TRUE)
      colnames(Bspline) = paste("bs", 1:nsieve, sep="")
      # cubic basis
      data = data.frame(Y_tilde=simY_tilde, X_tilde=simX_tilde, Y=simY, X=simX, Bspline)
      ### generate data
      res = cv_linear2ph(Y="Y", X="X", Y_unval="Y_tilde", X_unval="X_tilde", 
        Bspline=colnames(Bspline), data=data, nfolds = 5)
      pred_loglike[i] = res$avg_pred_loglik
  data.frame(nsieves, pred_loglike)

