R: Discretize the available data set

DiscretizeData {NPHazardRate}

R Documentation

Discretize the available data set

Description

Defines equispaced disjoint intervals based on the range of the sample and calculates empirical hazard rate estimates at each interval center

Usage

DiscretizeData(xin, xout)

Arguments

`xin`	A vector of input values
`xout`	Grid points where the function will be evaluated

Details

The function defines the subinterval length \Delta = (0.8\max(X_i) - \min(X_i))/N where N is the sample size. Then at each bin (subinterval) center, the empirical hazard rate estimate is calculated by

c_i = \frac{f_i}{\Delta(N-F_i +1) }

where f_i is the frequency of observations in the ith bin and F_i = \sum_{j\leq i} f_j is the empirical cummulative distribution estimate.

Value

A vector with the values of the function at the designated points xout or the random numbers drawn.

Examples

x<-seq(0, 5,length=100) #design points where the estimate will be calculated
SampleSize<-100 #amount of data to be generated
ti<- rweibull(SampleSize, .6, 1) # draw a random sample
ui<-rexp(SampleSize, .2)         # censoring sample
cat("\n AMOUNT OF CENSORING: ", length(which(ti>ui))/length(ti)*100, "\n")
x1<-pmin(ti,ui)                  # observed data
cen<-rep.int(1, SampleSize)      # initialize censoring indicators
cen[which(ti>ui)]<-0             # 0's correspond to censored indicators

a.use<-DiscretizeData(ti, x)     # discretize the data
BinCenters<-a.use$BinCenters     # get the data centers
ci<-a.use$ci                     # get empircal hazard rate estimates
Delta=a.use$Delta                # Binning range

[Package NPHazardRate version 0.1 Index]