mmcif_data {mmcif}R Documentation

Sets up an Object to Compute the Log Composite Likelihood

Description

Sets up the R and C++ objects that are needed to evaluate the log composite likelihood. This reduces to a log likelihood when only clusters of size one or two are used.

Usage

mmcif_data(
  formula,
  data,
  cause,
  time,
  cluster_id,
  max_time,
  spline_df = 3L,
  left_trunc = NULL,
  ghq_data = NULL,
  strata = NULL,
  knots = NULL,
  boundary_quantiles = c(0.025, 0.975)
)

Arguments

formula

formula for covariates in the risk and trajectories.

data

data.frame with the covariate and outcome information.

cause

an integer vector with the cause of each outcome. If there are n_causes of outcome, then the vector should have values in 1:(n_causes + 1) with n_causes + 1 indicating censoring.

time

a numeric vector with the observed times.

cluster_id

an integer vector with the cluster id of each individual.

max_time

the maximum time after which there are no observed events. It is denoted by \tau in the original article (Cederkvist et al., 2019).

spline_df

degrees of freedom to use for each spline in the cumulative incidence functions.

left_trunc

numeric vector with left-truncation times. NULL implies that there are not any individuals with left-truncation.

ghq_data

the default Gauss-Hermite quadrature nodes and weights to use. It should be a list with two elements called "node" and "weight". A default is provided if NULL is passed.

strata

an integer vector or a factor vector with the strata of each individual. NULL implies that there are no strata.

knots

A list of lists with knots for the splines. The inner lists needs to have elements called "knots" and "boundary_knots" which are passed to a function like ns. NULL yields defaults based on the quantiles of the observed event times. Note that the knots needs to be on the atanh((time - max_time / 2) / (max_time / 2)) scale.

boundary_quantiles

two dimensional numerical vector with boundary quantile probabilities after which the natural cubic splines for the time transformations are restricted to be linear. Only relevant if knots is not NULL.

Value

An object of class mmcif which is needed for the other functions in the package.

References

Cederkvist, L., Holst, K. K., Andersen, K. K., & Scheike, T. H. (2019). Modeling the cumulative incidence function of multivariate competing risks data allowing for within-cluster dependence of risk and timing. Biostatistics, Apr 1, 20(2), 199-217.

See Also

mmcif_fit, mmcif_start_values and mmcif_sandwich.

Examples

if(require(mets)){
  # prepare the data
  data(prt)

  # truncate the time
  max_time <- 90
  prt <- within(prt, {
    status[time >= max_time] <- 0
    time <- pmin(time, max_time)
  })

  # select the DZ twins and re-code the status
  prt_use <- subset(prt, zyg == "DZ") |>
    transform(status = ifelse(status == 0, 3L, status))

  # randomly sub-sample
  set.seed(1)
  prt_use <- subset(
    prt_use, id %in% sample(unique(id), length(unique(id)) %/% 10L))

  mmcif_obj <- mmcif_data(
    ~ country - 1, prt_use, status, time, id, max_time,
    2L, strata = country)
}


[Package mmcif version 0.1.1 Index]