cv_linear2ph {sleev}R Documentation

Performs cross-validation to calculate the average predicted log likelihood for the linear2ph function. This function can be used to select the B-spline basis that yields the largest average predicted log likelihood.

Description

Performs cross-validation to calculate the average predicted log likelihood for the linear2ph function. This function can be used to select the B-spline basis that yields the largest average predicted log likelihood.

Usage

cv_linear2ph(
  Y_unval = NULL,
  Y = NULL,
  X_unval = NULL,
  X = NULL,
  Z = NULL,
  Bspline = NULL,
  data = NULL,
  nfolds = 5,
  MAX_ITER = 2000,
  TOL = 1e-04,
  verbose = FALSE
)

Arguments

Y_unval

Specifies the column of the error-prone outcome that is continuous. Subjects with missing values of Y_unval are omitted from the analysis. This argument is required.

Y

Specifies the column that stores the validated value of Y_unval in the second phase. Subjects with missing values of Y are considered as those not selected in the second phase. This argument is required.

X_unval

Specifies the columns of the error-prone covariates. Subjects with missing values of X_unval are omitted from the analysis. This argument is required.

X

Specifies the columns that store the validated values of X_unval in the second phase. Subjects with missing values of X are considered as those not selected in the second phase. This argument is required.

Z

Specifies the columns of the accurately measured covariates. Subjects with missing values of Z are omitted from the analysis. This argument is optional.

Bspline

Specifies the columns of the B-spline basis. Subjects with missing values of Bspline are omitted from the analysis. This argument is required.

data

Specifies the name of the dataset. This argument is required.

nfolds

Specifies the number of cross-validation folds. The default value is 5. Although nfolds can be as large as the sample size (leave-one-out cross-validation), it is not recommended for large datasets. The smallest value allowable is 3.

MAX_ITER

Specifies the maximum number of iterations in the EM algorithm. The default number is 2000. This argument is optional.

TOL

Specifies the convergence criterion in the EM algorithm. The default value is 1E-4. This argument is optional.

verbose

If TRUE, then show details of the analysis. The default value is FALSE.

Value

avg_pred_loglike

Stores the average predicted log likelihood.

pred_loglike

Stores the predicted log likelihood in each fold.

converge

Stores the convergence status of the EM algorithm in each run.

Examples

  rho = 0.3
  p = 0.3
  n = 100
  n2 = 40
  alpha = 0.3
  beta = 0.4
   
  ### generate data
  simX = rnorm(n)
  epsilon = rnorm(n)
  simY = alpha+beta*simX+epsilon
  error = MASS::mvrnorm(n, mu=c(0,0), Sigma=matrix(c(1, rho, rho, 1), nrow=2))
   
  simS = rbinom(n, 1, p)
  simU = simS*error[,2]
  simW = simS*error[,1]
  simY_tilde = simY+simW
  simX_tilde = simX+simU
   
  id_phase2 = sample(n, n2)
   
  simY[-id_phase2] = NA
  simX[-id_phase2] = NA
   
  # cubic basis
  nsieves = c(5, 10)
  pred_loglike = rep(NA, length(nsieves))
  for (i in 1:length(nsieves)) {
      nsieve = nsieves[i]
      Bspline = splines::bs(simX_tilde, df=nsieve, degree=3, 
        Boundary.knots=range(simX_tilde), intercept=TRUE)
      colnames(Bspline) = paste("bs", 1:nsieve, sep="")
      # cubic basis
     
      data = data.frame(Y_tilde=simY_tilde, X_tilde=simX_tilde, Y=simY, X=simX, Bspline)
      ### generate data
     
      res = cv_linear2ph(Y="Y", X="X", Y_unval="Y_tilde", X_unval="X_tilde", 
        Bspline=colnames(Bspline), data=data, nfolds = 5)
      pred_loglike[i] = res$avg_pred_loglik
    }
   
  data.frame(nsieves, pred_loglike)


[Package sleev version 1.0.3 Index]