R: Piecewise Constant Hazard Models

pchreg {pch}

R Documentation

Piecewise Constant Hazard Models

Description

This function estimates piecewise exponential models on right-censored, left-truncated, or interval-censored data. The function is mainly intended for prediction and, unlike the phreg function available in the eha package, it allows the effect of covariates, and not just the baseline hazard, to depend on time.

Usage

pchreg(formula, breaks, data, weights, splinex = NULL)

Arguments

`formula`	an object of class “`formula`”: a symbolic description of the regression model. The response must be a `Surv` object as returned by `Surv` (see ‘Details’).
`breaks`	either a numeric vector of two or more unique cut points, or a single number giving the number of intervals. If missing, the number and position of the `breaks` are determined automatically.
`data`	an optional data frame containing the variables in the model.
`weights`	an optional vector of weights to be used in the fitting process. The weights will be normalized to sum to the sample size. This implies that, for example, using double weights will not halve the standard errors.
`splinex`	either `NULL`, or an object created with `splinex` (see ‘Details’).

Details

The left side of the formula must be specified as Surv(time, event), for right-censored data; Surv(time0, time, event), for right-censored and left-truncated data (time0 < time, time0 can be -Inf); and Surv(time1, time2, type = "interval2") for interval-censored data (use time1 = time2 for exact observations, time1 = -Inf or NA for left-censored, and time2 = Inf or NA for right-censored). Using Surv(time) is also allowed and indicates that the data are neither censored nor truncated. Note that the response variable (and thus the breaks) can be negative.

To fit the model, the time interval is first divided in sub-intervals as defined by breaks. When the location of breaks is not specified, the empirical quantiles are used as cut points. A different costant hazard (exponential) model is then fitted in each sub-interval, modeling the log-hazard as a linear function of covariates. The special function splinex can be used to build flexible models.

This type of model can be utilized to obtain a nonparametric maximum likelihood estimator of a conditional distribution, achieving the flexibility of nonparametric estimators while keeping the model parametric in practice. Users unfamiliar with this approach are recommended to read Geman and Hwang (1982) for an overview, and the paper by Ackerberg, Chen and Hahn (2012) describing how this approach can be applied to simplify inference in two-step semiparametric models.

Value

An object of class “pch”, which is a list with the following items:

`call`	the matched call.
`beta`	a matrix of regression coefficients. Rows correspond to covariates, while columns correspond to different time intervals.
`breaks`	the used cut points, with attributes `'h'` indicating the length of each interval, and `'k'` denoting the number of intervals.
`covar`	the estimated asymptotic covariance matrix.
`logLik`	the value of the maximized log-likelihood, with attribute “`df`” indicating the number of free model parameters.
`lambda`	the fitted hazard values in each interval.
`Lambda`	the fitted cumulative hazard values at the end of each interval.
`mf`	the model frame used.
`x`	the model matrix.
`conv.status`	a code indicating the convergence status. It takes value `0` if the algorithm has converged successfully; `1` if convergence has not been achieved; and `2` if, although convergence has been achieved, more than 1% of observations have an associated survival numerically equal to zero, indicating that the solution may not be well-behaved or the model is misspecified.

The accessor functions summary, coef, predict, nobs, logLik, AIC, BIC can be used to extract information from the fitted model. This function is mainly intended for prediction and simulation: see predict.pch.

Note

NOTE1. Right-censoring is a special case of interval censoring, in which exact events are identified by time2 = time1, while censored observations have time2 = Inf. Note, however, that pchreg will not use the same routines for right-censored and interval-censored data, implying that pchreg(Surv(time1, time2, type = "interval2") ~ x) may not be identical to pchreg(Surv(time = time1, event = (time2 < Inf)) ~ x). The latter is usually faster and slightly more accurate.

NOTE2. Within each interval, the risk of the event may be zero at some covariate values. For each covariate x, the algorithm will try to identify a threshold c such that all events (in any given interval) occur when x < c (x > c). A zero risk will be automatically fitted above (below) the threshold, using an offset of -100 on the log-hazard.

Author(s)

Paolo Frumento <paolo.frumento@unipi.it>

References

Ackerberg, D., Chen, X., and Hahn, J. (2012). A Practical Asymptotic Variance Estimator for Two-Step Semiparametric Estimators. The Review of Economics and Statistics, 94(2), 481-498.

Friedman, M. (1982). Piecewise Exponential Models for Survival Data with Covariates. The Annals of Statistics, 10(1), pp. 101-113.

Geman, S., and Hwang, C.R. (1982). Nonparametric Maximum Likelihood Estimation by the Method of Sieves. The Annals of Statistics,10(2), 401-414.

Examples


  # Simulate right-censored data
  
  n <- 1000
  x <- runif(n) # a covariate
  time <- rexp(n, exp(1 + x)) # time-to-event
  cens <- runif(n,0,2) # censoring event
  y <- pmin(time,cens) # observed variable
  d <- (time <= cens) # indicator of the event
  model <- pchreg(Surv(y,d) ~ x, breaks = 10)




  # Simulate right-censored, left-truncated data
  
  n <- 1000
  x <- runif(n) # a covariate
  time0 <- rexp(n, 10) # time at enrollment
  time <- rexp(n, exp(1 + x)) # time-to-event
  cens <- runif(n,0,2) # censoring event
  
  # y,d,x are only observed if (y > time0)
  y <- pmin(time,cens)
  d <- (time <= cens)
  u <- (y > time0)
  y <- y[u]
  d <- d[u]
  x <- x[u]
  z <- time0[u]
  model <- pchreg(Surv(z,y,d) ~ x, breaks = 10)




  # Simulate interval-censored data
  
  n <- 1000
  x <- runif(n) # a covariate
  time <- 10*rexp(n, exp(1 + x)) # time-to-event
  time1 <- floor(time)
  time2 <- ceiling(time)
  # Individuals are observed at discrete times
  # I observe (time1,time2) such that time1 <= time <= time2
  model <- pchreg(Surv(time1,time2, type = "interval2") ~ x, breaks = 10)
  
  
  
  
  # Try summary(model), predict(model)
  # See the documentation of predict.pch for more examples

[Package pch version 2.1 Index]