R: Tolerance Interval for a Poisson Distribution

tolIntPois {EnvStats}

R Documentation

Tolerance Interval for a Poisson Distribution

Description

Construct a \beta-content or \beta-expectation tolerance interval for a Poisson distribution.

Usage

  tolIntPois(x, coverage = 0.95, cov.type = "content", ti.type = "two-sided", 
    conf.level = 0.95)

Arguments

`x`	numeric vector of observations, or an object resulting from a call to an estimating function that assumes a Poisson distribution (i.e., `epois` or `epoisCensored`). If `cov.type="content"` then `x` must be a numeric vector. If `x` is a numeric vector, missing (`NA`), undefined (`NaN`), and infinite (`Inf`, `-Inf`) values are allowed but will be removed.
`coverage`	a scalar between 0 and 1 indicating the desired coverage of the tolerance interval. The default value is `coverage=0.95`. If `cov.type="expectation"`, this argument is ignored.
`cov.type`	character string specifying the coverage type for the tolerance interval. The possible values are `"content"` (`\beta`-content; the default), and `"expectation"` (`\beta`-expectation). See the DETAILS section for more information.
`ti.type`	character string indicating what kind of tolerance interval to compute. The possible values are `"two-sided"` (the default), `"lower"`, and `"upper"`.
`conf.level`	a scalar between 0 and 1 indicating the confidence level associated with the tolerance interval. The default value is `conf.level=0.95`.

Details

If x contains any missing (NA), undefined (NaN) or infinite (Inf, -Inf) values, they will be removed prior to performing the estimation.

A tolerance interval for some population is an interval on the real line constructed so as to contain 100 \beta \% of the population (i.e., 100 \beta \% of all future observations), where 0 < \beta < 1. The quantity 100 \beta \% is called the coverage.

There are two kinds of tolerance intervals (Guttman, 1970):

A \beta-content tolerance interval with confidence level 100(1-\alpha)\% is constructed so that it contains at least 100 \beta \% of the population (i.e., the coverage is at least 100 \beta \%) with probability 100(1-\alpha)\%, where 0 < \alpha < 1. The quantity 100(1-\alpha)\% is called the confidence level or confidence coefficient associated with the tolerance interval.
A \beta-expectation tolerance interval is constructed so that the average coverage of the interval is 100 \beta \%.

Note: A \beta-expectation tolerance interval with coverage 100 \beta \% is equivalent to a prediction interval for one future observation with associated confidence level 100 \beta \%. Note that there is no explicit confidence level associated with a \beta-expectation tolerance interval. If a \beta-expectation tolerance interval is treated as a \beta-content tolerance interval, the confidence level associated with this tolerance interval is usually around 50% (e.g., Guttman, 1970, Table 4.2, p.76).

Because of the discrete nature of the Poisson distribution, even true tolerance intervals (tolerance intervals based on the true value of \lambda) will usually not contain exactly \beta\% of the population. For example, for the Poisson distribution with parameter lambda=2, the interval [0, 4] contains 94.7% of this distribution and the interval [0, 5] contains 98.3% of this distribution. Thus, no interval can contain exactly 95% of this distribution.

\beta-Content Tolerance Intervals for a Poisson Distribution
Zacks (1970) showed that for monotone likelihood ratio (MLR) families of discrete distributions, a uniformly most accurate upper \beta100\% \beta-content tolerance interval with associated confidence level (1-\alpha)100\% is constructed by finding the upper (1-\alpha)100\% confidence limit for the parameter associated with the distribution, and then computing the \beta'th quantile of the distribution assuming the true value of the parameter is equal to the upper confidence limit. This idea can be extended to one-sided lower and two-sided tolerance limits.

It can be shown that all distributions that are one parameter exponential families have the MLR property, and the Poisson distribution is a one-parameter exponential family, so the method of Zacks (1970) can be applied to a Poisson distribution.

Let X denote a Poisson random variable with parameter lambda=\lambda. Let x_{p|\lambda} denote the p'th quantile of this distribution. That is,

Pr(X < x_{p|\lambda}) \le p \le Pr(X \le x_{p|\lambda}) \;\;\;\;\;\; (1)

Note that due to the discrete nature of the Poisson distribution, there will be several values of p associated with one value of X. For example, for \lambda=2, the value 1 is the p'th quantile for any value of p between 0.140 and 0.406.

Let \underline{x} denote a vector of n observations from a Poisson distribution with parameter lambda=\lambda. When ti.type="upper", the first step is to compute the one-sided upper (1-\alpha)100\% confidence limit for \lambda based on the observations \underline{x} (see the help file for epois). Denote this upper confidence limit by UCL. The one-sided upper \beta100\% tolerance limit is then given by:

[0, x_{\beta | \lambda = UCL}] \;\;\;\;\;\; (2)

Similarly, when ti.type="lower", the first step is to compute the one-sided lower (1-\alpha)100\% confidence limit for \lambda based on the observations \underline{x}. Denote this lower confidence limit by LCL. The one-sided lower \beta100\% tolerance limit is then given by:

[x_{1-\beta | \lambda = LCL}, \infty] \;\;\;\;\;\; (3)

Finally, when ti.type="two-sided", the first step is to compute the two-sided (1-\alpha)100\% confidence limits for \lambda based on the observations \underline{x}. Denote these confidence limits by LCL and UCL. The two-sided \beta100\% tolerance limit is then given by:

[x_{\frac{1-\beta}{2} | \lambda = LCL}, x_{\frac{1+\beta}{2} | \lambda = UCL}] \;\;\;\;\;\; (4)

Note that the function tolIntPois uses the exact confidence limits for \lambda when computing \beta-content tolerance limits (see epois).

\beta-Expectation Tolerance Intervals for a Poisson Distribution
As stated above, a \beta-expectation tolerance interval with coverage \beta100\% is equivalent to a prediction interval for one future observation with associated confidence level \beta100\%. This is because the probability that any single future observation will fall into this interval is \beta100\%, so the distribution of the number of N future observations that will fall into this interval is binomial with parameters size=N and prob=\beta. Hence the expected proportion of future observations that fall into this interval is \beta100\% and is independent of the value of N. See the help file for predIntPois for information on how these intervals are constructed.

Value

If x is a numeric vector, tolIntPois returns a list of class "estimate" containing the estimated parameters, a component called interval containing the tolerance interval information, and other information. See estimate.object for details.

If x is the result of calling an estimation function, tolIntPois returns a list whose class is the same as x. The list contains the same components as x. If x already has a component called interval, this component is replaced with the tolerance interval information.

Note

Tolerance intervals have long been applied to quality control and life testing problems (Hahn, 1970b,c; Hahn and Meeker, 1991; Krishnamoorthy and Mathew, 2009). References that discuss tolerance intervals in the context of environmental monitoring include: Berthouex and Brown (2002, Chapter 21), Gibbons et al. (2009), Millard and Neerchal (2001, Chapter 6), Singh et al. (2010b), and USEPA (2009).

Gibbons (1987b) used the Poisson distribution to model the number of detected compounds per scan of the 32 volatile organic priority pollutants (VOC), and also to model the distribution of chemical concentration (in ppb). He explained the derivation of a one-sided upper \beta-content tolerance limit for a Poisson distribution based on the work of Zacks (1970) using the Pearson-Hartley approximation to the confidence limits for the mean parameter \lambda (see the help file for epois). Note that there are several typographical errors in the derivation and examples on page 575 of Gibbons (1987b) because there is confusion between where the value of \beta (the coverage) should be and where the value of 1-\alpha (the confidence level) should be. Gibbons et al. (2009, pp.103-104) gives correct formulas.

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Gibbons, R.D. (1987b). Statistical Models for the Analysis of Volatile Organic Compounds in Waste Disposal Sites. Ground Water 25, 572–580.

Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring, Second Edition. John Wiley & Sons, Hoboken.

Guttman, I. (1970). Statistical Tolerance Regions: Classical and Bayesian. Hafner Publishing Co., Darien, CT.

Hahn, G.J., and W.Q. Meeker. (1991). Statistical Intervals: A Guide for Practitioners. John Wiley and Sons, New York.

Johnson, N. L., S. Kotz, and A. Kemp. (1992). Univariate Discrete Distributions. Second Edition. John Wiley and Sons, New York, Chapter 4.

Krishnamoorthy K., and T. Mathew. (2009). Statistical Tolerance Regions: Theory, Applications, and Computation. John Wiley and Sons, Hoboken.

Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton.

Zacks, S. (1970). Uniformly Most Accurate Upper Tolerance Limits for Monotone Likelihood Ratio Families of Discrete Distributions. Journal of the American Statistical Association 65, 307–316.

Examples

  # Generate 20 observations from a Poisson distribution with parameter 
  # lambda=2. The interval [0, 4] contains 94.7% of this distribution and 
  # the interval [0,5] contains 98.3% of this distribution.  Thus, because 
  # of the discrete nature of the Poisson distribution, no interval contains 
  # exactly 95% of this distribution.  Use tolIntPois to estimate the mean 
  # parameter of the true distribution, and construct a one-sided upper 95% 
  # beta-content tolerance interval with associated confidence level 90%. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(250) 
  dat <- rpois(20, 2) 
  tolIntPois(dat, conf.level = 0.9)

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Poisson
  #
  #Estimated Parameter(s):          lambda = 1.8
  #
  #Estimation Method:               mle/mme/mvue
  #
  #Data:                            dat
  #
  #Sample Size:                     20
  #
  #Tolerance Interval Coverage:     95%
  #
  #Coverage Type:                   content
  #
  #Tolerance Interval Method:       Zacks
  #
  #Tolerance Interval Type:         two-sided
  #
  #Confidence Level:                90%
  #
  #Tolerance Interval:              LTL = 0
  #                                 UTL = 6

  #------

  # Clean up
  rm(dat)

[Package EnvStats version 2.8.1 Index]