epoisCensored {EnvStats}  R Documentation 
Estimate the mean of a Poisson distribution given a sample of data that has been subjected to Type I censoring, and optionally construct a confidence interval for the mean.
epoisCensored(x, censored, method = "mle", censoring.side = "left",
ci = FALSE, ci.method = "profile.likelihood", ci.type = "twosided",
conf.level = 0.95, n.bootstraps = 1000, pivot.statistic = "z",
ci.sample.size = sum(!censored))
x 
numeric vector of observations. Missing ( 
censored 
numeric or logical vector indicating which values of 
method 
character string specifying the method of estimation. The possible values are:

censoring.side 
character string indicating on which side the censoring occurs. The possible
values are 
ci 
logical scalar indicating whether to compute a confidence interval for the
mean or variance. The default value is 
ci.method 
character string indicating what method to use to construct the confidence interval
for the mean. The possible values are 
ci.type 
character string indicating what kind of confidence interval to compute. The
possible values are 
conf.level 
a scalar between 0 and 1 indicating the confidence level of the confidence interval.
The default value is 
n.bootstraps 
numeric scalar indicating how many bootstraps to use to construct the
confidence interval for the mean when 
pivot.statistic 
character string indicating which pivot statistic to use in the construction
of the confidence interval for the mean when 
ci.sample.size 
numeric scalar indicating what sample size to assume to construct the
confidence interval for the mean if 
If x
or censored
contain any missing (NA
), undefined (NaN
) or
infinite (Inf
, Inf
) values, they will be removed prior to
performing the estimation.
Let \underline{x}
denote a vector of N
observations from a
Poisson distribution with mean
lambda=
\lambda
.
Assume n
(0 < n < N
) of these
observations are known and c
(c=Nn
) of these observations are
all censored below (leftcensored) or all censored above (rightcensored) at
k
fixed censoring levels
T_1, T_2, \ldots, T_k; \; k \ge 1 \;\;\;\;\;\; (1)
For the case when k \ge 2
, the data are said to be Type I
multiply censored. For the case when k=1
,
set T = T_1
. If the data are leftcensored
and all n
known observations are greater
than or equal to T
, or if the data are rightcensored and all n
known observations are less than or equal to T
, then the data are
said to be Type I singly censored (Nelson, 1982, p.7), otherwise
they are considered to be Type I multiply censored.
Let c_j
denote the number of observations censored below or above censoring
level T_j
for j = 1, 2, \ldots, k
, so that
\sum_{i=1}^k c_j = c \;\;\;\;\;\; (2)
Let x_{(1)}, x_{(2)}, \ldots, x_{(N)}
denote the “ordered” observations,
where now “observation” means either the actual observation (for uncensored
observations) or the censoring level (for censored observations). For
rightcensored data, if a censored observation has the same value as an
uncensored one, the uncensored observation should be placed first.
For leftcensored data, if a censored observation has the same value as an
uncensored one, the censored observation should be placed first.
Note that in this case the quantity x_{(i)}
does not necessarily represent
the i
'th “largest” observation from the (unknown) complete sample.
Finally, let \Omega
(omega) denote the set of n
subscripts in the
“ordered” sample that correspond to uncensored observations.
Estimation
Maximum Likelihood Estimation (method="mle"
)
For Type I left censored data, the likelihood function is given by:
L(\lambda  \underline{x}) = {N \choose c_1 c_2 \ldots c_k n} \prod_{j=1}^k [F(T_j)]^{c_j} \prod_{i \in \Omega} f[x_{(i)}] \;\;\;\;\;\; (3)
where f
and F
denote the probability density function (pdf) and
cumulative distribution function (cdf) of the population
(Cohen, 1963; Cohen, 1991, pp.6, 50). That is,
f(t) = \frac{e^{\lambda} \lambda^t}{t!}, \; x = 0, 1, 2, \ldots \;\;\;\;\;\; (4)
F(t) = \sum_{i=0}^t f(i) = \sum_{i=0}^t \frac{e^{\lambda} \lambda^i}{i!} \;\;\;\;\;\; (5)
(Johnson et al., 1992, p.151). For left singly censored data, equation (3) simplifies to:
L(\lambda  \underline{x}) = {N \choose c} [F(T)]^{c} \prod_{i = c+1}^n f[x_{(i)}] \;\;\;\;\;\; (6)
Similarly, for Type I right censored data, the likelihood function is given by:
L(\lambda  \underline{x}) = {N \choose c_1 c_2 \ldots c_k n} \prod_{j=1}^k [1  F(T_j)]^{c_j} \prod_{i \in \Omega} f[x_{(i)}] \;\;\;\;\;\; (7)
and for right singly censored data this simplifies to:
L(\lambda  \underline{x}) = {N \choose c} [1  F(T)]^{c} \prod_{i = 1}^n f[x_{(i)}] \;\;\;\;\;\; (8)
The maximum likelihood estimators are computed by maximizing the likelihood function.
For rightcensored data, taking the derivative of the loglikelihood function
with respect to \lambda
and setting this to 0 produces the following equation:
\bar{x} = \lambda \{1  \sum_{j=1}^K \frac{c_j}{n} [\frac{f(T_j)}{1F(T_j)}] \} \;\;\;\;\;\; (9)
where
\bar{x} = \frac{1}{n} \sum{i \in \Omega} x_i \;\;\;\;\;\; (10)
Note that the quantity defined in equation (10) is simply the mean of the uncensored observations.
For leftcensored data, taking the derivative of the loglikelihood function with
respect to \lambda
and setting this to 0 produces the following equation:
\bar{x} = \lambda \{1 + \sum_{j=1}^K \frac{c_j}{n} [\frac{f(T_j  1)}{F(T_j  1)}] \} \;\;\;\;\;\; (11)
The function epoisCensored
computes the maximum likelihood estimator
of \lambda
by solving Equation (9) (rightcensored data) or
Equation (11) (leftcensored data); it uses the sample mean of
the uncensored observations as the initial value.
Setting Censored Observations to Half the Censoring Level (method="half.cen.level"
)
This method is applicable only to left censored data.
This method involves simply replacing all the censored observations with half their
detection limit, and then computing the mean and standard deviation with the usual
formulas (see epois
).
This method is included only to allow comparison of this method to other methods.
Setting leftcensored observations to half the censoring level is not
recommended.
Confidence Intervals
This section explains how confidence intervals for the mean \lambda
are
computed.
Likelihood Profile (ci.method="profile.likelihood"
)
This method was proposed by Cox (1970, p.88), and Venzon and Moolgavkar (1988)
introduced an efficient method of computation. This method is also discussed by
Stryhn and Christensen (2003) and Royston (2007).
The idea behind this method is to invert the likelihoodratio test to obtain a
confidence interval for the mean \lambda
. Equation (3) above
shows the form of the likelihood function L(\lambda  \underline{x})
for
multiply leftcensored data, and Equation (7) shows the function for
multiply rightcensored data.
Following Stryhn and Christensen (2003), denote the maximum likelihood estimate
of the mean by \lambda^*
. The likelihood
ratio test statistic (G^2
) of the hypothesis H_0: \lambda = \lambda_0
(where \lambda_0
is a fixed value) equals the drop in 2 log(L)
between the
“full” model and the reduced model with \lambda
fixed at \mu_0
, i.e.,
G^2 = 2 \{log[L(\lambda^*)]  log[L(\lambda_0)]\} \;\;\;\;\;\; (11)
.
Under the null hypothesis, the test statistic G^2
follows a
chisquared distribution with 1 degree of freedom.
A twosided (1\alpha)100\%
confidence interval for the mean \lambda
consists of all values of \lambda_0
for which the test is not significant at
level alpha
:
\lambda_0: G^2 \le \chi^2_{1, {1\alpha}} \;\;\;\;\;\; (12)
where \chi^2_{\nu, p}
denotes the p
'th quantile of the
chisquared distribution with \nu
degrees of freedom.
Onesided lower and onesided upper confidence intervals are computed in a similar
fashion, except that the quantity 1\alpha
in Equation (12) is replaced with
12\alpha
.
Normal Approximation (ci.method="normal.approx"
)
This method constructs approximate (1\alpha)100\%
confidence intervals for
\lambda
based on the assumption that the estimator of \lambda
is
approximately normally distributed. That is, a twosided (1\alpha)100\%
confidence interval for \lambda
is constructed as:
[\hat{\lambda}  t_{1\alpha/2, m1}\hat{\sigma}_{\hat{\lambda}}, \; \hat{\lambda} + t_{1\alpha/2, m1}\hat{\sigma}_{\hat{\lambda}}] \;\;\;\; (13)
where \hat{\lambda}
denotes the estimate of \lambda
,
\hat{\sigma}_{\hat{\lambda}}
denotes the estimated asymptotic standard
deviation of the estimator of \lambda
, m
denotes the assumed sample
size for the confidence interval, and t_{p,\nu}
denotes the p
'th
quantile of Student's tdistribuiton with \nu
degrees of freedom. Onesided confidence intervals are computed in a
similar fashion.
The argument ci.sample.size
determines the value of m
and by
default is equal to the number of uncensored observations.
This is simply an adhoc method of constructing
confidence intervals and is not based on any published theoretical results.
When pivot.statistic="z"
, the p
'th quantile from the
standard normal distribution is used in place of the
p
'th quantile from Student's tdistribution.
When \lambda
is estimated with the maximum likelihood estimator
(method="mle"
), the variance of \hat{\lambda}
is
estimated based on the inverse of the Fisher Information matrix. When
\lambda
is estimated using the halfcensoringlevel method
(method="half.cen.level"
), the variance of \hat{\lambda}
is
estimated as:
\hat{\sigma}^2_{\hat{\lambda}} = \frac{\hat{\lambda}}{m} \;\;\;\;\;\; (14)
where m
denotes the assumed sample size (see above).
Bootstrap and BiasCorrected Bootstrap Approximation (ci.method="bootstrap"
)
The bootstrap is a nonparametric method of estimating the distribution
(and associated distribution parameters and quantiles) of a sample statistic,
regardless of the distribution of the population from which the sample was drawn.
The bootstrap was introduced by Efron (1979) and a general reference is
Efron and Tibshirani (1993).
In the context of deriving an approximate (1\alpha)100\%
confidence interval
for the population mean \lambda
, the bootstrap can be broken down into the
following steps:
Create a bootstrap sample by taking a random sample of size N
from
the observations in \underline{x}
, where sampling is done with
replacement. Note that because sampling is done with replacement, the same
element of \underline{x}
can appear more than once in the bootstrap
sample. Thus, the bootstrap sample will usually not look exactly like the
original sample (e.g., the number of censored observations in the bootstrap
sample will often differ from the number of censored observations in the
original sample).
Estimate \lambda
based on the bootstrap sample created in Step 1, using
the same method that was used to estimate \lambda
using the original
observations in \underline{x}
. Because the bootstrap sample usually
does not match the original sample, the estimate of \lambda
based on the
bootstrap sample will usually differ from the original estimate based on
\underline{x}
.
Repeat Steps 1 and 2 B
times, where B
is some large number.
For the function epoisCensored
, the number of bootstraps B
is
determined by the argument n.bootstraps
(see the section ARGUMENTS above).
The default value of n.bootstraps
is 1000
.
Use the B
estimated values of \lambda
to compute the empirical
cumulative distribution function of this estimator of \lambda
(see
ecdfPlot
), and then create a confidence interval for \lambda
based on this estimated cdf.
The twosided percentile interval (Efron and Tibshirani, 1993, p.170) is computed as:
[\hat{G}^{1}(\frac{\alpha}{2}), \; \hat{G}^{1}(1\frac{\alpha}{2})] \;\;\;\;\;\; (15)
where \hat{G}(t)
denotes the empirical cdf evaluated at t
and thus
\hat{G}^{1}(p)
denotes the p
'th empirical quantile, that is,
the p
'th quantile associated with the empirical cdf. Similarly, a onesided lower
confidence interval is computed as:
[\hat{G}^{1}(\alpha), \; \infty] \;\;\;\;\;\; (16)
and a onesided upper confidence interval is computed as:
[0, \; \hat{G}^{1}(1\alpha)] \;\;\;\;\;\; (17)
The function epoisCensored
calls the R function quantile
to compute the empirical quantiles used in Equations (15)(17).
The percentile method bootstrap confidence interval is only firstorder
accurate (Efron and Tibshirani, 1993, pp.187188), meaning that the probability
that the confidence interval will contain the true value of \lambda
can be
off by k/\sqrt{N}
, where k
is some constant. Efron and Tibshirani
(1993, pp.184188) proposed a biascorrected and accelerated interval that is
secondorder accurate, meaning that the probability that the confidence interval
will contain the true value of \lambda
may be off by k/N
instead of
k/\sqrt{N}
. The twosided biascorrected and accelerated confidence interval is
computed as:
[\hat{G}^{1}(\alpha_1), \; \hat{G}^{1}(\alpha_2)] \;\;\;\;\;\; (18)
where
\alpha_1 = \Phi[\hat{z}_0 + \frac{\hat{z}_0 + z_{\alpha/2}}{1  \hat{a}(z_0 + z_{\alpha/2})}] \;\;\;\;\;\; (19)
\alpha_2 = \Phi[\hat{z}_0 + \frac{\hat{z}_0 + z_{1\alpha/2}}{1  \hat{a}(z_0 + z_{1\alpha/2})}] \;\;\;\;\;\; (20)
\hat{z}_0 = \Phi^{1}[\hat{G}(\hat{\lambda})] \;\;\;\;\;\; (21)
\hat{a} = \frac{\sum_{i=1}^N (\hat{\lambda}_{(\cdot)}  \hat{\lambda}_{(i)})^3}{6[\sum_{i=1}^N (\hat{\lambda}_{(\cdot)}  \hat{\lambda}_{(i)})^2]^{3/2}} \;\;\;\;\;\; (22)
where the quantity \hat{\lambda}_{(i)}
denotes the estimate of \lambda
using
all the values in \underline{x}
except the i
'th one, and
\hat{\lambda}{(\cdot)} = \frac{1}{N} \sum_{i=1}^N \hat{\lambda_{(i)}} \;\;\;\;\;\; (23)
A onesided lower confidence interval is given by:
[\hat{G}^{1}(\alpha_1), \; \infty] \;\;\;\;\;\; (24)
and a onesided upper confidence interval is given by:
[0, \; \hat{G}^{1}(\alpha_2)] \;\;\;\;\;\; (25)
where \alpha_1
and \alpha_2
are computed as for a twosided confidence
interval, except \alpha/2
is replaced with \alpha
in Equations (19) and (20).
The constant \hat{z}_0
incorporates the bias correction, and the constant
\hat{a}
is the acceleration constant. The term “acceleration” refers
to the rate of change of the standard error of the estimate of \lambda
with
respect to the true value of \lambda
(Efron and Tibshirani, 1993, p.186). For a
normal (Gaussian) distribution, the standard error of the estimate of \lambda
does not depend on the value of \lambda
, hence the acceleration constant is not
really necessary.
When ci.method="bootstrap"
, the function epoisCensored
computes both
the percentile method and biascorrected and accelerated method bootstrap confidence
intervals.
a list of class "estimateCensored"
containing the estimated parameters
and other information. See estimateCensored.object
for details.
A sample of data contains censored observations if some of the observations are reported only as being below or above some censoring level. In environmental data analysis, Type I leftcensored data sets are common, with values being reported as “less than the detection limit” (e.g., Helsel, 2012). Data sets with only one censoring level are called singly censored; data sets with multiple censoring levels are called multiply or progressively censored.
Statistical methods for dealing with censored data sets have a long history in the field of survival analysis and life testing. More recently, researchers in the environmental field have proposed alternative methods of computing estimates and confidence intervals in addition to the classical ones such as maximum likelihood estimation. Helsel (2012, Chapter 6) gives an excellent review of past studies of the properties of various estimators for parameters of a normal or lognormal distribution based on censored environmental data.
In practice, it is better to use a confidence interval for the mean or a joint confidence region for the mean and standard deviation (or coefficient of variation), rather than rely on a single pointestimate of the mean. Few studies have been done to evaluate the performance of methods for constructing confidence intervals for the mean or joint confidence regions for the mean and coefficient of variation of a Poisson distribution when data are subjected to single or multiple censoring.
Steven P. Millard (EnvStats@ProbStatInfo.com)
Cohen, A.C. (1991). Truncated and Censored Samples. Marcel Dekker, New York, New York, 312pp.
Cox, D.R. (1970). Analysis of Binary Data. Chapman & Hall, London. 142pp.
Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics 7, 1–26.
Efron, B., and R.J. Tibshirani. (1993). An Introduction to the Bootstrap. Chapman and Hall, New York, 436pp.
Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions, Fourth Edition. John Wiley and Sons, Hoboken, NJ.
Helsel, D.R. (2012). Statistics for Censored Environmental Data Using Minitab and R, Second Edition. John Wiley & Sons, Hoboken, New Jersey.
Johnson, N. L., S. Kotz, and A. Kemp. (1992). Univariate Discrete Distributions, Second Edition. John Wiley and Sons, New York, Chapter 4.
Millard, S.P., P. Dixon, and N.K. Neerchal. (2014; in preparation). Environmental Statistics with R. CRC Press, Boca Raton, Florida.
Nelson, W. (1982). Applied Life Data Analysis. John Wiley and Sons, New York, 634pp.
Royston, P. (2007). Profile Likelihood for Estimation and Confdence Intervals. The Stata Journal 7(3), pp. 376–387.
Stryhn, H., and J. Christensen. (2003). Confidence Intervals by the Profile Likelihood Method, with Applications in Veterinary Epidemiology. Contributed paper at ISVEE X (November 2003, Chile). https://gilvanguedes.com/wpcontent/uploads/2019/05/ProfileLikelihoodCI.pdf.
Venzon, D.J., and S.H. Moolgavkar. (1988). A Method for Computing ProfileLikelihoodBased Confidence Intervals. Journal of the Royal Statistical Society, Series C (Applied Statistics) 37(1), pp. 87–94.
Poisson, epois
, estimateCensored.object
.
# Generate 20 observations from a Poisson distribution with
# parameter lambda=10, and censor the values less than 10.
# Then generate 20 more observations from the same distribution
# and censor the values less than 20. Then estimate the mean
# using the maximum likelihood method.
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(300)
dat.1 < rpois(20, lambda=10)
censored.1 < dat.1 < 10
dat.1[censored.1] < 10
dat.2 < rpois(20, lambda=10)
censored.2 < dat.2 < 20
dat.2[censored.2] < 20
dat < c(dat.1, dat.2)
censored < c(censored.1, censored.2)
epoisCensored(dat, censored, ci = TRUE)
#Results of Distribution Parameter Estimation
#Based on Type I Censored Data
#
#
#Assumed Distribution: Poisson
#
#Censoring Side: left
#
#Censoring Level(s): 10 20
#
#Estimated Parameter(s): lambda = 11.05402
#
#Estimation Method: MLE
#
#Data: dat
#
#Censoring Variable: censored
#
#Sample Size: 40
#
#Percent Censored: 65%
#
#Confidence Interval for: lambda
#
#Confidence Interval Method: Profile Likelihood
#
#Confidence Interval Type: twosided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 9.842894
# UCL = 12.846484
#
# Clean up
#
rm(dat.1, censored.1, dat.2, censored.2, dat, censored)