threg {threg} | R Documentation |
fit a threshold regression model
Description
This function can be used to fit a threshold regressio model based on the first-hitting-time of a boundary by the sample path of a Wiener diffusion process. It uses maximum likelihood estimation method for calculating regression coefficient estimates, asymptotic standard errors and p-values.
Usage
threg(formula, data)
Arguments
formula |
a formula object, with the response on the left of a ~ operator, and the independent variables on the right. The response must be a survival object as returned by the Surv function. On the right of the ~ operator, a | operator must be used: on the left of the | operator, users specify independent variables that will be used in the linear regression function for |
data |
input dataset. Such dataset must be a survival dataset including at least the survival time variable and censoring variable. For the censoring variable, 1 should be used to indicate the subjects with failure observed, and 0 should be used to indicate the subjects that are right censored. The dataset can also include other independent variables that will be used in the threshold regression model. |
Details
Threshold regression is a recently developed regression methodology to analyze time to event data. For a review of this regression model, see Lee and Whitmore (2006, 2010). A unique feature of threshold regression is that the event time is considered as the time when an underlying stochastic process first hits a boundary threshold. In the context of survival data, for example, the event can be death. The death time of an individual is considered as the time when his/her latent health status decreases to the zero boundary.
In the threg
package, a Wiener process is used to model the latent health status process. An event is observed when
reaches
for the first time. Three parameters of the Wiener process are involved:
,
and
. Parameter
, called the drift of the Wiener process, is the rate per unit time at which the level of the sample path is changing. The sample path approaches the threshold if
. Parameter
is the initial value of the process and is taken as positive. Parameter
represents the variability per unit time of the process (Lee and Whitmore 2006). The first hitting time (FHT) of a Wiener process with
,
and
is an inverse Gaussian distribution with probability density function (p.d.f):
where . The p.d.f. is proper if
. The cumulative distribution function of the FHT is:
where is the cumulative distribution function of the standard normal distribution. Note that if
, the Wiener process may never hit the boundary at zero and hence there is a probability that the FHT is
, that is,
.
Since the health status process is usually latent (i.e., unobserved), an arbitrary unit can be used to measure such a process. Hence the variance parameter of the process is set to 1 in the
threg
package to fix the measurement unit of the process. Then we can regress the other two process parameters, and
on covariate data. We assume that
and
are linear in regression coefficients.
Suppose that the covariate vector is , where
are covariates and the leading 1 in
allows for a constant term in the regression relationship. Then
can be linked to the covariates with the following regression form:
and can be linked to the covariates with the following regression form:
Vectors and
are regression coefficients for
and
, respectively, with
and
. Note that researchers can set some elements in
or
to zero if they feel the corresponding covariates are not important in predicting
or
. For example, if covariate
in the vector
is considered not important to predict
, we can remove the
term by setting
to zero.
Author(s)
Tao Xiao
Maintainer: Tao Xiao <taoxiao1@gmail.com>
References
Lee, M-L. T., Whitmore, G. A. (2010) Proportional hazards and threshold regression: their theoretical and practical connections., Lifetime Data Analysis 16, 2: 196-214.
Lee, M-L. T., Whitmore, G. A. (2006) Threshold regression for survival analysis: modeling event times by a stochastic process, Statistical Science 21: 501-513.
Klein, J. P., Moeschberger, M. L. (2003) Survival Analysis: Techniques for Censored and Truncated Data. 2 edition. Springer-Verlag New York, Inc.
Garrett, J. M. (1997) Odds ratios and confidence intervals for logistic regression models with effect modification. Stata Technical Bulletin, 36, 1522. Reprinted in Stata Technical Bulletin Reprints, vol. 6, pp. 104-114. College Station, TX: Stata Press.
Xiao, T., Whitmore, G. A., He, Xin, Lee, M-L. T. (2015) The R Package threg to Implement Threshold Regression Models. Journal of Statistical Software, 66(8), 1-16. URL http://www.jstatsoft.org/v66/i08/.
Examples
#load the data "lkr"
data("lkr")
#Transform the "treatment2" variable into factor variable "f.treatment2" .
lkr$f.treatment2=factor(lkr$treatment2)
#fit the threshold regression model on the factor variable "f.treatment2",
fit<-threg(Surv(weeks, relapse)~ f.treatment2|f.treatment2,data = lkr)
fit
#load the data "bmt"
data("bmt")
#Transform the "group" and "fab" variables into factor variables
#"f.group" and "f.fab".
bmt$f.group=factor(bmt$group)
bmt$f.fab=factor(bmt$fab)
#fit a threshold regression model on the "bmt" dataset, by using "recipient_age" and
#"f.fab" as the predictors for ln(y0), and "f.group" and "f.fab" as predictors for mu.
fit<-threg(Surv(time, indicator)~ recipient_age+f.fab|f.group+f.fab, data = bmt)
fit