R: Compute NPMLE of CDF from doubly censored data

d011 {dblcens}

R Documentation

Compute NPMLE of CDF from doubly censored data

Description

d011 computes the NPMLE of CDF from doubly censored data via EM algorithm starting from an initial estimator that have jumps at (1) uncensored points; (2) (mid-point of) consecutive survival times with censoring indicator pattern of (0,2), (see below for definition).

When there are ties, the left (right) censored points are treated as happened slightly before (after), to break tie. Also when the last observation happens to be right censored and/or when the first observation happens to be left censored, they are changed to uncensored. This is to ensure we obtain a proper distribution as the CDF estimator. (though this can be modified easily as they are written in R language).

It also computes the NPMLE of the two censoring distributions. There is an option that you may also try to compute the three influence functions (but could slow and memory hungry).

Usage

d011(z, d, identical = rep(0, length(z)),
     maxiter = 49, error = 0.00001, influence.fun = FALSE)

Arguments

`z`	a vector of length n denoting observed times, (ties permitted)
`d`	a vector of length n that contains censoring indicator: d= 2 or 1 or 0, (according to z being left, not, right censored)
`identical`	optional. A vector of length n that has values either 0 or 1. identical[i]=1 means: even if (z[i],d[i]) is identical to (z[j],d[j]), for some `j \not= i`, they still stay as 2 observations, (not 1 obs. with weight 2, which only happen if identical[i]=0 and identical[j] =0). One reason for this is because they may have different covariates not shown here. This adds more flexibility for regression applications. Default value is identical = 0, (i.e. collapse if identical observations).
`maxiter`	optional integer value. default to 49
`error`	optional. Default to 0.00001
`influence.fun`	optional. Default to FALSE. If TRUE, the code will try to compute the influence functions (3 of them) at the censored times. This computation can be very slow and memory intensive (for data with >500 censored times).

Details

The true NPMLE may have probability mass inside the interval where two consecutive times z[i] < z[j], having censoring pattern of d[i]=0 and d[j]=2. As the first example below show.

Value

a list contain the NPMLE of CDF and other information.

`time`	Times of input z, with time corresponding to status=2 removed.
`status`	Censoring status of the above times. Status = -1 means this is an added time because of the censoring pattern (0,2).
`surv`	Survival probability at the above times.
`jump`	Jumps of the NPMLE at the above times.
`exttime`	Similar to times but those with status =2 not removed.
`extstatus`	status of exttime
`extjump`	jump pf NPMLE at exttime.
`extsurv.Sx`	Estimated lifetime distribution.
`surv0.Sy`	One of the censoring distributions.
`jump0`	Jump of surv0.Sy
`surv2.Sz`	Another censoring distribution.
`jump2`	Jump of surv2.Sz
`conv`	A vector of length 2: the actual number of iterations, and the actual error of successive iteration. If the iteration number equal to the maxiter you set, then the iteration has not converged.
`Nodes`	Points where the influence function is computed.
`IC1tu`	Influence function value at the nodes. See Chang (1990) for details.
`IC1tu2`	Influence function values at other points. See Chang (1990) for details.
`IC2tu`	ditto IC1tu
`IC3tu`	ditto IC1tu
`VarFt`	Estimated variances of `\hat F(t)` at the Nodes.

Author(s)

Mai Zhou, Li Lee.

References

Chang, M. N. and Yang, G. L. (1987). Strong consistency of a nonparametric estimator of the survival function with doubly censored data. Ann. Statist. 15, 1536-1547.

Turnbull (1976) The empirical distribution function with arbitrarily grouped, censored and truncated data. JRSS B, 290-295.

Chang, M. N. (1990). Weak convergence in doubly censored data. Ann. Statist. 18, 390-405.

Chen, K. and Zhou, M. (2003). Nonparametric Hypothesis Testing and Confidence Intervals with Doubly Censored Data. Lifetime Data Analysis, 9, 71-91.

Examples

d011(z=c(1,2,3,4,5), d=c(1,0,2,2,1))
#
# you should get something like below (and more)
#
#       $time:
#       [1] 1.0 2.0 2.5 5.0    (notice the times, (3,4), corresponding
#                                   to d=2 are removed, and time 2.5 added
#       $status:               since there is a (0,2) pattern at
#       [1]  1  0 -1  1        times 2, 3. The status indicator of -1
#                                   show that it is an added time )
#       $surv
#       [1] 0.5000351 0.5000351 0.3333177 0.0000000
#
#       $jump
#       [1] 0.4999649 0.0000000 0.1667174 0.3333177
#
#       $exttime
#       [1] 1.0 2.0 2.5 3.0 4.0 5.0
#
#       $extstatus
#       [1]  1  0 -1  2  2  1
#
#       ...... 
#
#       $conv
#       [1] 3.300000e+01  8.788214e-06  ### did 33 iterations
#
# BTW, the true NPMLE of surv, i.e. 1-F(), is (1/2, 1/2, 1/3, 0) at times (1,2,2.5,5).
###### Example 2. 
d011(c(1,2,3,4,5), c(1,2,1,0,1),influence.fun=TRUE)
#     we get
# ......
#$conv:
#[1] 3 0
#
#$Nodes:
#[1] 2 4
#
#$IC1tu:
#     [,1] [,2]
#[1,]   -1    0
#[2,]   -1   -2
#
#$IC2tu:
#           [,1] [,2]
#[1,]  0.0000000    0
#[2,] -0.3333333    0
#
#$IC3tu:
#     [,1]       [,2]
#[1,]   -1 -0.6666667
#[2,]   -1 -1.0000000
#
#$VarFt:
#[1] 0.24 0.24           ## est var of hat F(t) at t=nodes
#######################################################

[Package dblcens version 1.1.9 Index]