cv_linear2ph {sleev} | R Documentation |
Performs cross-validation to calculate the average predicted log likelihood for the linear2ph
function. This function can be used to select the B-spline basis that yields the largest average predicted log likelihood.
Description
Performs cross-validation to calculate the average predicted log likelihood for the linear2ph
function. This function can be used to select the B-spline basis that yields the largest average predicted log likelihood.
Usage
cv_linear2ph(
Y_unval = NULL,
Y = NULL,
X_unval = NULL,
X = NULL,
Z = NULL,
Bspline = NULL,
data = NULL,
nfolds = 5,
MAX_ITER = 2000,
TOL = 1e-04,
verbose = FALSE
)
Arguments
Y_unval |
Specifies the column of the error-prone outcome that is continuous. Subjects with missing values of |
Y |
Specifies the column that stores the validated value of |
X_unval |
Specifies the columns of the error-prone covariates. Subjects with missing values of |
X |
Specifies the columns that store the validated values of |
Z |
Specifies the columns of the accurately measured covariates. Subjects with missing values of |
Bspline |
Specifies the columns of the B-spline basis. Subjects with missing values of |
data |
Specifies the name of the dataset. This argument is required. |
nfolds |
Specifies the number of cross-validation folds. The default value is |
MAX_ITER |
Specifies the maximum number of iterations in the EM algorithm. The default number is |
TOL |
Specifies the convergence criterion in the EM algorithm. The default value is |
verbose |
If |
Value
avg_pred_loglike |
Stores the average predicted log likelihood. |
pred_loglike |
Stores the predicted log likelihood in each fold. |
converge |
Stores the convergence status of the EM algorithm in each run. |
Examples
rho = 0.3
p = 0.3
n = 100
n2 = 40
alpha = 0.3
beta = 0.4
### generate data
simX = rnorm(n)
epsilon = rnorm(n)
simY = alpha+beta*simX+epsilon
error = MASS::mvrnorm(n, mu=c(0,0), Sigma=matrix(c(1, rho, rho, 1), nrow=2))
simS = rbinom(n, 1, p)
simU = simS*error[,2]
simW = simS*error[,1]
simY_tilde = simY+simW
simX_tilde = simX+simU
id_phase2 = sample(n, n2)
simY[-id_phase2] = NA
simX[-id_phase2] = NA
# cubic basis
nsieves = c(5, 10)
pred_loglike = rep(NA, length(nsieves))
for (i in 1:length(nsieves)) {
nsieve = nsieves[i]
Bspline = splines::bs(simX_tilde, df=nsieve, degree=3,
Boundary.knots=range(simX_tilde), intercept=TRUE)
colnames(Bspline) = paste("bs", 1:nsieve, sep="")
# cubic basis
data = data.frame(Y_tilde=simY_tilde, X_tilde=simX_tilde, Y=simY, X=simX, Bspline)
### generate data
res = cv_linear2ph(Y="Y", X="X", Y_unval="Y_tilde", X_unval="X_tilde",
Bspline=colnames(Bspline), data=data, nfolds = 5)
pred_loglike[i] = res$avg_pred_loglik
}
data.frame(nsieves, pred_loglike)