test.fit.iqr {qrcm} | R Documentation |
Goodness-of-Fit Test
Description
Goodness-of-fit test for a model
fitted with iqr
. The Kolmogorov-Smirnov statistic and the Cramer-Von Mises statistic
are computed. Their distribution under the null hypothesis is evaluated
with Monte Carlo.
Usage
## S3 method for class 'iqr'
test.fit(object, R = 100, zcmodel, icmodel, trace = FALSE, ...)
Arguments
object |
an object of class “ |
R |
number of Monte Carlo replications. If R = 0, the function only returns the test statistics. |
zcmodel |
a numeric value indicating how to model the joint distribution of censoring
( |
icmodel |
a list of operational parameters to simulate interval-censored data. See ‘Details’. |
trace |
logical. If TRUE, the progress will be printed. |
... |
for future arguments. |
Details
This function permits assessing goodness of fit by testing the null hypothesis
that the CDF values follow a U(0,1)
distribution, indicating that
the model is correctly specified.
Since the fitted CDF values depend on estimated parameters, the distribution of
the test statistic is not known. To evaluate it, the model is fitted on R simulated datasets
generated under the null hypothesis.
The testing procedures are described in details by Frumento and Bottai (2016, 2017) and Frumento and Corsini (2024).
Right-censored and left-truncated data. If the data are censored and truncated, object$CDF
is as well a censored and truncated outcome, and its quantiles must be computed by using a suitable version of Kaplan-Meier product-limit estimator. The fitted survival curve is then compared with that of a U(0,1)
distribution.
To run Monte Carlo simulations when data are censored or truncated, it is necessary to estimate
the distribution of the censoring and that of the truncation variable. To this goal,
the function pchreg
from the pch package is used, with default settings.
The joint distribution of the censoring variable (C
) and the truncation variable (Z
)
can be specified in two ways:
If zcmodel = 1, it is assumed that
C = Z + U
, whereU
is a positive variable and is independent ofZ
, given covariates. This is the most common situation, and is verified when censoring occurs at the end of the follow-up. Under this scenario,C
andZ
are correlated withP(C > Z) = 1
.If zcmodel = 2, it is assumed that
C
andZ
are conditionally independent. This situation is more plausible when all censoring is due to drop-out.
Interval-censored data.
If the data are interval-censored, object$CDF
is composed of two columns, left
and right
. A nonparametric estimator is applied to the interval-censored pair (left, right)
using the icenReg R package. The fitted quantiles are then compared with those of a U(0,1)
distribution.
To simulate interval-censored data, additional information is required about the censoring mechanism. This testing procedure assumes that interval censoring occurs because each individual is only examined at discrete time points, say t[1], t[2], t[3],
... If this is not the mechanism that generated your data, you should not use this function.
In the ideal situation, one can use t[1], t[2], t[3],
... to estimate the distribution of the time between visits, t[j + 1] - t[j]
. If, however, one only knows time1
and time2
, the two endpoints of the interval, things are more complicated. The empirical distribution of time2 - time1
is NOT a good estimator of the distribution of t[j + 1] - t[j]
, because the events are likely contained in longer intervals, a fact that obviously generates selection bias. There are two common situations: either t[j + 1] - t[j]
is a constant (e.g., one month), or it is random. If t[j + 1] - t[j]
is random and has an Exponential distribution with scale lambda
, then time2 - time1
has a Gamma(shape = 2, scale = lambda)
distribution. This is due to the property of memoryless of the Exponential distribution, and may only be an approximation if there is a floor effect (i.e., if lambda
is larger than the low quantiles of the time-to-event).
The icmodel
argument must be a list with four elements, model
, lambda
(optional), t0
, and logscale
:
-
model
. A character string, either'constant'
or'exponential'
. -
lambda
. Ifmodel = 'constant'
,lambda
will be interpreted as a constant time between visits. Ifmodel = 'exponential'
, instead, it will be interpreted as the mean (not the rate) of the Exponential distribution that is assumed to describe the time between visits.If you either know
lambda
, or you can estimate it by using additional information (e.g., individual data on all visit timest[1], t[2], t[3],
...), you can supply a scalar value, that will be used for all individuals, or a vector, allowinglambda
to differ across individuals.If, instead,
lambda
is not supplied or isNULL
, the algorithm proceeds as follows. Ifmodel = 'constant'
, the time between visits is assumed to be constant and equal tolambda = mean(time2 - time1)
. Ifmodel = 'exponential'
, times between visits are generated from an Exponential distribution in which the mean,lambda
, is allowed to depend on covariates according to a log-linear model, and is estimated by fitting a Gamma model ontime2 - time1
as described earlier. -
t0
. Ift0 = 0
, data will be simulated assuming that the first visit occurs at time = 0 (the “onset”), i.e., when the individual enters the risk set. This mechanism cannot generate left censoring. Ift0 = 1
, instead, the first visit occurs after time zero. This mechanism generates left censoring whenever the event occurs before the first visit. Finally, ift0 = -1
, visits start before time 0. Under this scenario, it is assumed that not only the time at the event, but also the time at onset is interval-censored. If the event occurs in the interval(time1, time2)
, and the onset is in(t01, t02)
, then the total duration is in the interval(time1 - t02, time2 - t01)
. -
logscale
. Logical: is the response variable on the log scale? If this is the case, the Monte Carlo procedure will act accordingly. Note thatlambda
will always be assumed to describe the time between visits on the natural scale.
The mechanism described above can automatically account for the presence of left censoring.
In order to simulate right-censored observations (if present in the data), the distribution of the censoring variable is estimated with the function pchreg
from the pch package.
Value
a matrix with columns statistic
and p.value
,
reporting the Kolmogorov-Smirnov and Cramer-Von Mises statistic and the associated
p-values evaluated with Monte Carlo.
Author(s)
Paolo Frumento paolo.frumento@unipi.it
References
Frumento, P., and Bottai, M. (2016). Parametric modeling of quantile regression coefficient functions. Biometrics, 72 (1), pp 74-84, doi: 10.1111/biom.12410.
Frumento, P., and Bottai, M. (2017). Parametric modeling of quantile regression coefficient functions with censored and truncated data. Biometrics, doi: 10.1111/biom.12675.
Frumento, P., and Corsini, L. (2024). Using parametric quantile regression to investigate determinants of unemployment duration. Unpublished manuscript.
Examples
y <- rnorm(1000)
m1 <- iqr(y ~ 1, formula.p = ~ I(qnorm(p))) # correct
m2 <- iqr(y ~ 1, formula.p = ~ p) # misspecified
test.fit(m1)
test.fit(m2)