predIntPois {EnvStats} | R Documentation |

## Prediction Interval for a Poisson Distribution

### Description

Estimate the mean of a `Poisson distribution`

, and
construct a prediction interval for the next `k`

observations or
next set of `k`

sums.

### Usage

```
predIntPois(x, k = 1, n.sum = 1, method = "conditional",
pi.type = "two-sided", conf.level = 0.95, round.limits = TRUE)
```

### Arguments

`x` |
numeric vector of observations, or an object resulting from a call to an
estimating function that assumes a Poisson distribution
(i.e., |

`k` |
positive integer specifying the number of future observations or sums the
prediction interval should contain with confidence level |

`n.sum` |
positive integer specifying the sample size associated with the |

`method` |
character string specifying the method to use. The possible values are: See the DETAILS section for more information on these methods. The |

`pi.type` |
character string indicating what kind of prediction interval to compute.
The possible values are |

`conf.level` |
a scalar between 0 and 1 indicating the confidence level of the prediction interval.
The default value is |

`round.limits` |
logical scalar indicating whether to round the computed prediction limits to the
nearest integer. The default value is |

### Details

A prediction interval for some population is an interval on the real line constructed so
that it will contain `k`

future observations or averages from that population with
some specified probability `(1-\alpha)100\%`

, where 0 < `\alpha`

< 1 and `k`

is some pre-specified positive integer. The quantity `(1-\alpha)100\%`

is call
the confidence coefficient or confidence level associated with the prediction interval.

In the case of a Poisson distribution, we have modified the
usual meaning of a prediction interval and instead construct an interval that will
contain `k`

future observations or `k`

future *sums* with a certain
confidence level.

A prediction interval is a *random* interval; that is, the lower and/or
upper bounds are random variables computed based on sample statistics in the
baseline sample. Prior to taking one specific baseline sample, the probability
that the prediction interval will contain the next `k`

averages is
`(1-\alpha)100\%`

. Once a specific baseline sample is taken and the
prediction interval based on that sample is computed, the probability that that
prediction interval will contain the next `k`

averages is not necessarily
`(1-a)100\%`

, but it should be close.

If an experiment is repeated `N`

times, and for each experiment:

A sample is taken and a

`(1-a)100\%`

prediction interval for`k=1`

future observation is computed, andOne future observation is generated and compared to the prediction interval,

then the number of prediction intervals that actually contain the future observation
generated in step 2 above is a binomial random variable with parameters
`size=`

`N`

and `prob=`

`(1-\alpha)100\%`

(see Binomial).

If, on the other hand, only one baseline sample is taken and only one prediction
interval for `k=1`

future observation is computed, then the number of
future observations out of a total of `N`

future observations that will be
contained in that one prediction interval is a binomial random variable with
parameters `size=`

`N`

and `prob=`

`(1-\alpha^*)100\%`

, where
`\alpha^*`

depends on the true population parameters and the computed
bounds of the prediction interval.

Because of the discrete nature of the Poisson distribution,
even if the true mean of the distribution `\lambda`

were known exactly, the
actual confidence level associated with a prediction limit will usually not be exactly equal to
`(1-\alpha)100\%`

. For example, for the Poisson distribution with parameter
`lambda=2`

, the interval [0, 4] contains 94.7% of this distribution and
the interval [0,5] contains 98.3% of this distribution. Thus, no interval can
contain exactly 95% of this distribution, so it is impossible to construct an
exact 95% prediction interval for the next `k=1`

observation for a
Poisson distribution with parameter `lambda=2`

.

**The Form of a Poisson Prediction Interval**

Let `\underline{x} = x_1, x_2, \ldots, x_n`

denote a vector of `n`

observations from a Poisson distribution with parameter
`lambda=`

`\lambda`

. Also, let `X`

denote the sum of these
`n`

random variables, i.e.,

`X = \sum_{i=1}^n x_i \;\;\;\;\;\; (1)`

Finally, let `m`

denote the sample size associated with the `k`

future
sums (i.e., `n.sum=`

`m`

). When `m=1`

, each sum is really just a
single observation, so in the rest of this help file the term “sums”
replaces the phrase “observations or sums”.

Let `\underline{y} = y_1, y_2, \ldots, y_m`

denote a vector of `m`

future observations from a Poisson distribution with parameter
`lambda=`

`\lambda^{*}`

, and set `Y`

equal to the sum of these
`m`

random variables, i.e.,

`Y = \sum_{i=1}^m y_i \;\;\;\;\;\; (2)`

Then `Y`

has a Poisson distribution with parameter
`lambda=`

`m\lambda^{*}`

(Johnson et al., 1992, p.160). We are interested
in constructing a prediction limit for the next value of `Y`

, or else the next
`k`

sums of `m`

Poisson random variables, based on the observed value of
`X`

and assuming `\lambda^{*} = \lambda`

.

For a Poisson distribution, the form of a two-sided prediction interval is:

`[m\bar{x} - K, m\bar{x} + K] = [cX - K, cX + K] \;\;\;\;\;\; (3)`

where

`\bar{x} = \frac{X}{n} = \sum_{i=1}^n x_i \;\;\;\;\;\; (4)`

`c = \frac{m}{n} \;\;\;\;\;\; (5)`

and `K`

is a constant that depends on the sample size `n`

, the
confidence level `(1-\alpha)100\%`

, the number of future sums `k`

,
and the sample size associated with the future sums `m`

. Do not confuse
the constant `K`

(uppercase `K`

) with the number of future sums
`k`

(lowercase `k`

). The symbol `K`

is used here to be consistent
with the notation used for prediction intervals for the normal distribution
(see `predIntNorm`

).

Similarly, the form of a one-sided lower prediction interval is:

`[m\bar{x} - K, \infty] = [cX - K, \infty] \;\;\;\;\;\; (6)`

and the form of a one-sided upper prediction interval is:

`[0, m\bar{x} + K] = [0, cX + K] \;\;\;\;\;\; (7)`

The derivation of the constant `K`

is explained below.

**Conditional Distribution** (`method="conditional"`

)

Nelson (1970) derives a prediction interval for the case `k=1`

based on the
conditional distribution of `Y`

given `X+Y`

. He notes that the conditional
distribution of `Y`

given the quantity `X+Y=w`

is
binomial with parameters
`size=`

`w`

and `prob=`

`[m\lambda^{*} / (m\lambda^{*} + n\lambda)]`

(Johnson et al., 1992, p.161). When `k=1`

, the prediction limits are computed
as those most extreme values of `Y`

that still yield a non-significant test of
the hypothesis `H_0: \lambda^{*} = \lambda`

, which for the conditional
distribution of `Y`

is equivalent to the hypothesis
`H_0`

: `prob=[m /(m + n)]`

.

Using the relationship between the binomial and
F-distribution (see the explanation of exact confidence
intervals in the help file for `ebinom`

), Nelson (1982, p. 203) states
that exact two-sided `(1-\alpha)100\%`

prediction limits [LPL, UPL] are the
closest integer solutions to the following equations:

`\frac{m}{LPL + 1} = \frac{n}{X} F(2 LPL + 2, 2X, 1 - \alpha/2) \;\;\;\;\;\; (8)`

`\frac{UPL}{n} = \frac{X+1}{n} F(2X + 2, 2 UPL, 1 - \alpha/2) \;\;\;\;\;\; (9)`

where `F(\nu_1, \nu_2, p)`

denotes the `p`

'th quantile of the
F-distribution with `\nu_1`

and `\nu_2`

degrees of
freedom.

If `ci.type="lower"`

, `\alpha/2`

is replaced with `\alpha`

in
Equation (8) above for `LPL`

, and `UPL`

is set to `\infty`

.

If `ci.type="upper"`

, `\alpha/2`

is replaced with `\alpha`

in
Equation (9) above for `UPL`

, and `LPL`

is set to 0.

**NOTE:** This method is not extended to the case `k > 1`

.

**Conditional Distribution Approximation Based on Normal Distribution**

(`method="conditional.approx.normal"`

)

Cox and Hinkley (1974, p.245) derive an approximate prediction interval for the case
`k=1`

. Like Nelson (1970), they note that the conditional distribution of
`Y`

given the quantity `X+Y=w`

is binomial with
parameters `size=`

`w`

and
`prob=`

`[m\lambda^{*} / (m\lambda^{*} + n\lambda)]`

, and that the
hypothesis `H_0: \lambda^{*} = \lambda`

is equivalent to the hypothesis
`H_0`

: `prob=[m /(m + n)]`

.

Cox and Hinkley (1974, p.245) suggest using the normal approximation to the
binomial distribution (in this case, without the continuity correction;
see Zar, 2010, pp.534-536 for information on the continuity correction associated
with the normal approximation to the binomial distribution). Under the null
hypothesis `H_0: \lambda^{*} = \lambda`

, the quantity

`z = [Y - \frac{c(X+Y)}{1+c}] / \{ [\frac{c(X+Y)}{(1+c)^2}]^{1/2} \} \;\;\;\;\;\; (10)`

is approximately distributed as a standard normal random variable.

*The Case When k = 1*

When `k = 1`

and `pi.type="two-sided"`

, the prediction limits are computed
by solving the equation

`z^2 \le z_{1 - \alpha/2}^2 \;\;\;\;\;\; (11)`

where `z_p`

denotes the `p`

'th quantile of the standard normal distribution.
In this case, Gibbons (1987b) notes that the quantity `K`

in Equation (3) above
is given by:

`K = \frac{t^2c}{2} tc[X (1 + \frac{1}{c}) + \frac{t^2}{4}]^{1/2} \;\;\;\;\;\; (12)`

where `t = z_{1-\alpha/2}`

.

When `pi.type="lower"`

or `pi.type="upper"`

, `K`

is computed exactly
as above, except `t`

is set to `t = z_{1-\alpha}`

.

*The Case When k > 1*

When `k > 1`

, Gibbons (1987b) suggests using the Bonferroni inequality.
That is, the value of `K`

is computed exactly as for the case `k=1`

described above, except that the Bonferroni value of `t`

is used in place of the
usual value of `t`

:

When `pi.type="two-side"`

, `t = z_{1 - (\alpha/k)/2}`

.

When `pi.type="lower"`

or `pi.type="upper"`

, `t = z_{1 - \alpha/k}`

.

**Conditional Distribution Approximation Based on Student's t-Distribution**

(`method="conditional.approx.t"`

)

When `method="conditional.approx.t"`

, the exact same procedure is used as when

`method="conditional.approx.normal"`

, except that the quantity in
Equation (10) is assumed to follow a Student's t-distribution with `n-1`

degrees of freedom. Thus, all occurrences of `z_p`

are replaced with
`t_{n-1, p}`

, where `t_{\nu, p}`

denotes the `p`

'th quantile of
Student's t-distribution with `\nu`

degrees of freedom.

**Normal Approximation** (`method="normal.approx"`

)

The normal approximation for Poisson prediction limits was given by
Nelson (1970; 1982, p.203) and is based on the fact that the mean and variance of a
Poisson distribution are the same (Johnson et al, 1992, p.157), and for
“large” values of `n`

and `m`

, both `X`

and `Y`

are
approximately normally distributed.

*The Case When k = 1*

The quantity `Y - cX`

is approximately normally distributed with expectation and
variance given by:

`E(Y - cX) = E(Y) - cE(X) = m\lambda - cn\lambda = 0 \;\;\;\;\;\; (13)`

`Var(Y - cX) = Var(Y) + c^2 Var(X) = m\lambda + c^2 n\lambda = m\lambda (1 + \frac{m}{n}) \;\;\;\;\;\; (14)`

so the quantity

`z = \frac{Y-cX}{\sqrt{m\hat{\lambda}(1 + \frac{m}{n})}} = \frac{Y-cX}{\sqrt{m\bar{x}(1 + \frac{m}{n})}} \;\;\;\;\;\; (15)`

is approximately distributed as a standard normal random variable. The function
`predIntPois`

, however, assumes this quantity is distributed as approximately
a Student's t-distribution with `n-1`

degrees of freedom.

Thus, following the idea of prediction intervals for a normal distribution
(see `predIntNorm`

), when `pi.type="two-sided"`

, the constant
`K`

for a `(1-\alpha)100\%`

prediction interval for the next `k=1`

sum
of `m`

observations is computed as:

`K = t_{n-1, 1-\alpha/2} \sqrt{m\bar{x} (1 + \frac{m}{n})} \;\;\;\;\;\; (16)`

where `t_{\nu, p}`

denotes the `p`

'th quantile of a
Student's t-distribution with `\nu`

degrees of freedom.

Similarly, when `pi.type="lower"`

or `pi.type="upper"`

, the constant
`K`

is computed as:

`K = t_{n-1, 1-\alpha} \sqrt{m\bar{x} (1 + \frac{m}{n})} \;\;\;\;\;\; (17)`

*The Case When k > 1*

When `k > 1`

, the value of `K`

is computed exactly as for the case
`k = 1`

described above, except that the Bonferroni value of `t`

is used
in place of the usual value of `t`

:

When `pi.type="two-sided"`

,

`K = t_{n-1, 1-(\alpha/k)/2} \sqrt{m\bar{x} (1 + \frac{m}{n})} \;\;\;\;\;\; (18)`

When `pi.type="lower"`

or `pi.type="upper"`

,

`K = t_{n-1, 1-(\alpha/k)} \sqrt{m\bar{x} (1 + \frac{m}{n})} \;\;\;\;\;\; (19)`

Hahn and Nelson (1973, p.182) discuss another method of computing `K`

when
`k > 1`

, but this method is not implemented here.

### Value

If `x`

is a numeric vector, `predIntPois`

returns a list of class
`"estimate"`

containing the estimated parameter, the prediction interval,
and other information. See the help file for

`estimate.object`

for details.

If `x`

is the result of calling an estimation function,
`predIntPois`

returns a list whose class is the same as `x`

.
The list contains the same components as `x`

, as well as a component called
`interval`

containing the prediction interval information.
If `x`

already has a component called `interval`

, this component is
replaced with the prediction interval information.

### Note

Prediction and tolerance intervals have long been applied to quality control and life testing problems. Nelson (1970) notes that his development of confidence and prediction limits for the Poisson distribution is based on well-known results dating back to the 1950's. Hahn and Nelson (1973) review predicion intervals for several distributions, including Poisson prediction intervals. The mongraph by Hahn and Meeker (1991) includes a discussion of Poisson prediction intervals.

Gibbons (1987b) uses the Poisson distribution to model the number of detected
compounds per scan of the 32 volatile organic priority pollutants (VOC), and also
to model the distribution of chemical concentration (in ppb), and presents formulas
for prediction and tolerance intervals. The formulas for prediction intervals are
based on Cox and Hinkley (1974, p.245). Gibbons (1987b) only deals with
the case where `n.sum=1`

.

Gibbons et al. (2009, pp. 72–76) discuss methods for Poisson prediction limits.

### Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

### References

Cox, D.R., and D.V. Hinkley. (1974). *Theoretical Statistics*.
Chapman and Hall, New York, pp.242–245.

Gibbons, R.D. (1987b). Statistical Models for the Analysis of Volatile Organic
Compounds in Waste Disposal Sites. *Ground Water* **25**, 572–580.

Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009).
*Statistical Methods for Groundwater Monitoring*, Second Edition.
John Wiley & Sons, Hoboken, pp. 72–76.

Hahn, G.J., and W.Q. Meeker. (1991). *Statistical Intervals: A Guide for Practitioners*.
John Wiley and Sons, New York.

Hahn, G., and W. Nelson. (1973). A Survey of Prediction Intervals and Their
Applications. *Journal of Quality Technology* **5**, 178–188.

Johnson, N. L., S. Kotz, and A. Kemp. (1992). *Univariate Discrete
Distributions*. Second Edition. John Wiley and Sons, New York, Chapter 4.

Millard, S.P., and N.K. Neerchal. (2001). *Environmental Statistics with S-PLUS*.
CRC Press, Boca Raton.

Miller, R.G. (1981a). *Simultaneous Statistical Inference*.
McGraw-Hill, New York, pp.8, 76–81.

Nelson, W.R. (1970). Confidence Intervals for the Ratio of Two Poisson Means and
Poisson Predictor Intervals. *IEEE Transactions of Reliability* **R-19**,
42–49.

Nelson, W.R. (1982). *Applied Life Data Analysis*. John Wiley and Sons,
New York, pp.200–204.

Zar, J.H. (2010). *Biostatistical Analysis*.
Fifth Edition. Prentice-Hall, Upper Saddle River, NJ, pp. 585–586.

### See Also

`Poisson`

, `epois`

,
`estimate.object`

, Prediction Intervals,
`tolIntPois`

, Estimating Distribution Parameters.

### Examples

```
# Generate 20 observations from a Poisson distribution with parameter
# lambda=2. The interval [0, 4] contains 94.7% of this distribution and
# the interval [0,5] contains 98.3% of this distribution. Thus, because
# of the discrete nature of the Poisson distribution, no interval contains
# exactly 95% of this distribution. Use predIntPois to estimate the mean
# parameter of the true distribution, and construct a one-sided upper
# 95% prediction interval for the next single observation from this distribution.
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(250)
dat <- rpois(20, lambda = 2)
predIntPois(dat, pi.type = "upper")
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Poisson
#
#Estimated Parameter(s): lambda = 1.8
#
#Estimation Method: mle/mme/mvue
#
#Data: dat
#
#Sample Size: 20
#
#Prediction Interval Method: conditional
#
#Prediction Interval Type: upper
#
#Confidence Level: 95%
#
#Number of Future Observations: 1
#
#Prediction Interval: LPL = 0
# UPL = 5
#----------
# Compare results above with the other approximation methods:
predIntPois(dat, method = "conditional.approx.normal",
pi.type = "upper")$interval$limits
#LPL UPL
# 0 4
predIntPois(dat, method = "conditional.approx.t",
pi.type = "upper")$interval$limits
#LPL UPL
# 0 4
predIntPois(dat, method = "normal.approx",
pi.type = "upper")$interval$limits
#LPL UPL
# 0 4
#Warning message:
#In predIntPois(dat, method = "normal.approx", pi.type = "upper") :
# Estimated value of 'lambda' and/or number of future observations
# is/are probably too small for the normal approximation to work well.
#==========
# Using the same data as in the previous example, compute a one-sided
# upper 95% prediction limit for k=10 future observations.
# Using conditional approximation method based on the normal distribution.
predIntPois(dat, k = 10, method = "conditional.approx.normal",
pi.type = "upper")
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Poisson
#
#Estimated Parameter(s): lambda = 1.8
#
#Estimation Method: mle/mme/mvue
#
#Data: dat
#
#Sample Size: 20
#
#Prediction Interval Method: conditional.approx.normal
#
#Prediction Interval Type: upper
#
#Confidence Level: 95%
#
#Number of Future Observations: 10
#
#Prediction Interval: LPL = 0
# UPL = 6
# Using method based on approximating conditional distribution with
# Student's t-distribution
predIntPois(dat, k = 10, method = "conditional.approx.t",
pi.type = "upper")$interval$limits
#LPL UPL
# 0 6
#==========
# Repeat the above example, but set k=5 and n.sum=3. Thus, we want a
# 95% upper prediction limit for the next 5 sets of sums of 3 observations.
predIntPois(dat, k = 5, n.sum = 3, method = "conditional.approx.t",
pi.type = "upper")$interval$limits
#LPL UPL
# 0 12
#==========
# Reproduce Example 3.6 in Gibbons et al. (2009, p. 75)
# A 32-constituent VOC scan was performed for n=16 upgradient
# samples and there were 5 detections out of these 16. We
# want to construct a one-sided upper 95% prediction limit
# for 20 monitoring wells (so k=20 future observations) based
# on these data.
# First we need to create a data set that will yield a mean
# of 5/16 based on a sample size of 16. Any number of data
# sets will do. Here are two possible ones:
dat <- c(rep(1, 5), rep(0, 11))
dat <- c(2, rep(1, 3), rep(0, 12))
# Now call predIntPois. Don't round the limits so we can
# compare to the example in Gibbons et al. (2009).
predIntPois(dat, k = 20, method = "conditional.approx.t",
pi.type = "upper", round.limits = FALSE)
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Poisson
#
#Estimated Parameter(s): lambda = 0.3125
#
#Estimation Method: mle/mme/mvue
#
#Data: dat
#
#Sample Size: 16
#
#Prediction Interval Method: conditional.approx.t
#
#Prediction Interval Type: upper
#
#Confidence Level: 95%
#
#Number of Future Observations: 20
#
#Prediction Interval: LPL = 0.000000
# UPL = 2.573258
#==========
# Cleanup
#--------
rm(dat)
```

*EnvStats*version 2.8.1 Index]