test.fit.iqr {qrcm}R Documentation

Goodness-of-Fit Test

Description

Goodness-of-fit test for a model fitted with iqr. The Kolmogorov-Smirnov statistic and the Cramer-Von Mises statistic are computed. Their distribution under the null hypothesis is evaluated with Monte Carlo.

Usage

## S3 method for class 'iqr'
test.fit(object, R = 100, zcmodel, icmodel, trace = FALSE, ...)

Arguments

object

an object of class “iqr”.

R

number of Monte Carlo replications. If R = 0, the function only returns the test statistics.

zcmodel

a numeric value indicating how to model the joint distribution of censoring (C) and truncation (Z). See ‘Details’.

icmodel

a list of operational parameters to simulate interval-censored data. See ‘Details’.

trace

logical. If TRUE, the progress will be printed.

...

for future arguments.

Details

This function permits assessing goodness of fit by testing the null hypothesis that the CDF values follow a U(0,1) distribution, indicating that the model is correctly specified. Since the fitted CDF values depend on estimated parameters, the distribution of the test statistic is not known. To evaluate it, the model is fitted on R simulated datasets generated under the null hypothesis.

The testing procedures are described in details by Frumento and Bottai (2016, 2017) and Frumento and Corsini (2024).

Right-censored and left-truncated data. If the data are censored and truncated, object$CDF is as well a censored and truncated outcome, and its quantiles must be computed by using a suitable version of Kaplan-Meier product-limit estimator. The fitted survival curve is then compared with that of a U(0,1) distribution.

To run Monte Carlo simulations when data are censored or truncated, it is necessary to estimate the distribution of the censoring and that of the truncation variable. To this goal, the function pchreg from the pch package is used, with default settings.

The joint distribution of the censoring variable (C) and the truncation variable (Z) can be specified in two ways:

Interval-censored data.

If the data are interval-censored, object$CDF is composed of two columns, left and right. A nonparametric estimator is applied to the interval-censored pair (left, right) using the icenReg R package. The fitted quantiles are then compared with those of a U(0,1) distribution.

To simulate interval-censored data, additional information is required about the censoring mechanism. This testing procedure assumes that interval censoring occurs because each individual is only examined at discrete time points, say t[1], t[2], t[3],... If this is not the mechanism that generated your data, you should not use this function.

In the ideal situation, one can use t[1], t[2], t[3],... to estimate the distribution of the time between visits, t[j + 1] - t[j]. If, however, one only knows time1 and time2, the two endpoints of the interval, things are more complicated. The empirical distribution of time2 - time1 is NOT a good estimator of the distribution of t[j + 1] - t[j], because the events are likely contained in longer intervals, a fact that obviously generates selection bias. There are two common situations: either t[j + 1] - t[j] is a constant (e.g., one month), or it is random. If t[j + 1] - t[j] is random and has an Exponential distribution with scale lambda, then time2 - time1 has a Gamma(shape = 2, scale = lambda) distribution. This is due to the property of memoryless of the Exponential distribution, and may only be an approximation if there is a floor effect (i.e., if lambda is larger than the low quantiles of the time-to-event).

The icmodel argument must be a list with four elements, model, lambda (optional), t0, and logscale:

The mechanism described above can automatically account for the presence of left censoring. In order to simulate right-censored observations (if present in the data), the distribution of the censoring variable is estimated with the function pchreg from the pch package.

Value

a matrix with columns statistic and p.value, reporting the Kolmogorov-Smirnov and Cramer-Von Mises statistic and the associated p-values evaluated with Monte Carlo.

Author(s)

Paolo Frumento paolo.frumento@unipi.it

References

Frumento, P., and Bottai, M. (2016). Parametric modeling of quantile regression coefficient functions. Biometrics, 72 (1), pp 74-84, doi: 10.1111/biom.12410.

Frumento, P., and Bottai, M. (2017). Parametric modeling of quantile regression coefficient functions with censored and truncated data. Biometrics, doi: 10.1111/biom.12675.

Frumento, P., and Corsini, L. (2024). Using parametric quantile regression to investigate determinants of unemployment duration. Unpublished manuscript.

Examples

y <- rnorm(1000)
m1 <- iqr(y ~ 1, formula.p = ~ I(qnorm(p))) # correct
m2 <- iqr(y ~ 1, formula.p = ~ p)  # misspecified

test.fit(m1)
test.fit(m2)


[Package qrcm version 3.1 Index]