DiscretizeData {NPHazardRate}R Documentation

Discretize the available data set


Defines equispaced disjoint intervals based on the range of the sample and calculates empirical hazard rate estimates at each interval center


DiscretizeData(xin, xout)



A vector of input values


Grid points where the function will be evaluated


The function defines the subinterval length \Delta = (0.8\max(X_i) - \min(X_i))/N where N is the sample size. Then at each bin (subinterval) center, the empirical hazard rate estimate is calculated by

c_i = \frac{f_i}{\Delta(N-F_i +1) }

where f_i is the frequency of observations in the ith bin and F_i = \sum_{j\leq i} f_j is the empirical cummulative distribution estimate.


A vector with the values of the function at the designated points xout or the random numbers drawn.


x<-seq(0, 5,length=100) #design points where the estimate will be calculated
SampleSize<-100 #amount of data to be generated
ti<- rweibull(SampleSize, .6, 1) # draw a random sample
ui<-rexp(SampleSize, .2)         # censoring sample
cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n")
x1<-pmin(ti,ui)                  # observed data
cen<-rep.int(1, SampleSize)      # initialize censoring indicators
cen[which(ti>ui)]<-0             # 0's correspond to censored indicators

a.use<-DiscretizeData(ti, x)     # discretize the data
BinCenters<-a.use$BinCenters     # get the data centers
ci<-a.use$ci                     # get empircal hazard rate estimates
Delta=a.use$Delta                # Binning range

