R: Compute the Value of K for a Tolerance Interval for a Normal...

tolIntNormK {EnvStats}

R Documentation

Compute the Value of `K` for a Tolerance Interval for a Normal Distribution

Description

Compute the value of K (the multiplier of estimated standard deviation) used to construct a tolerance interval based on data from a normal distribution.

Usage

  tolIntNormK(n, df = n - 1, coverage = 0.95, cov.type = "content", 
    ti.type = "two-sided", conf.level = 0.95, method = "exact", 
    rel.tol = 1e-07, abs.tol = rel.tol)

Arguments

`n`	a positive integer greater than 2 indicating the sample size upon which the tolerance interval is based.
`df`	the degrees of freedom associated with the tolerance interval. The default is `df=n-1`.
`coverage`	a scalar between 0 and 1 indicating the desired coverage of the tolerance interval. The default value is `coverage=0.95`.
`cov.type`	character string specifying the coverage type for the tolerance interval. The possible values are `"content"` (`\beta`-content; the default), and `"expectation"` (`\beta`-expectation). See the help file for `tolIntNorm` for more information on the difference between `\beta`-content and `\beta`-expectation tolerance intervals.
`ti.type`	character string indicating what kind of tolerance interval to compute. The possible values are `"two-sided"` (the default), `"lower"`, and `"upper"`.
`conf.level`	a scalar between 0 and 1 indicating the confidence level associated with the tolerance interval. The default value is `conf.level=0.95`.
`method`	for the case of a two-sided tolerance interval, a character string specifying the method for constructing the tolerance interval. This argument is ignored if `ti.type="lower"` or `ti.type="upper"`. The possible values are `"exact"` (the default) and `"wald.wolfowitz"` (the Wald-Wolfowitz approximation). See the DETAILS section for more information.
`rel.tol`	in the case when `ti.type="two-sided"` and `method="exact"`, the argument `rel.tol` is passed to the function `integrate`. The default value is `rel.tol=1e-07`.
`abs.tol`	in the case when `ti.type="two-sided"` and `method="exact"`, the argument `abs.tol` is passed to the function `integrate`. The default value is the value of `rel.tol`.

Details

A tolerance interval for some population is an interval on the real line constructed so as to contain 100 \beta \% of the population (i.e., 100 \beta \% of all future observations), where 0 < \beta < 1. The quantity 100 \beta \% is called the coverage.

There are two kinds of tolerance intervals (Guttman, 1970):

A \beta-content tolerance interval with confidence level 100(1-\alpha)\% is constructed so that it contains at least 100 \beta \% of the population (i.e., the coverage is at least 100 \beta \%) with probability 100(1-\alpha)\%, where 0 < \alpha < 1. The quantity 100(1-\alpha)\% is called the confidence level or confidence coefficient associated with the tolerance interval.
A \beta-expectation tolerance interval is constructed so that the average coverage of the interval is 100 \beta \%.

Note: A \beta-expectation tolerance interval with coverage 100 \beta \% is equivalent to a prediction interval for one future observation with associated confidence level 100 \beta \%. Note that there is no explicit confidence level associated with a \beta-expectation tolerance interval. If a \beta-expectation tolerance interval is treated as a \beta-content tolerance interval, the confidence level associated with this tolerance interval is usually around 50% (e.g., Guttman, 1970, Table 4.2, p.76).

For a normal distribution, the form of a two-sided 100(1-\alpha)\% tolerance interval is:

[\bar{x} - Ks, \, \bar{x} + Ks]

where \bar{x} denotes the sample mean, s denotes the sample standard deviation, and K denotes a constant that depends on the sample size n, the coverage, and, for a \beta-content tolerance interval (but not a \beta-expectation tolerance interval), the confidence level.

Similarly, the form of a one-sided lower tolerance interval is:

[\bar{x} - Ks, \, \infty]

and the form of a one-sided upper tolerance interval is:

[-\infty, \, \bar{x} + Ks]

but K differs for one-sided versus two-sided tolerance intervals.

The Derivation of K for a \beta-Content Tolerance Interval

One-Sided Case

When ti.type="upper" or ti.type="lower", the constant K for a 100 \beta \% \beta-content tolerance interval with associated confidence level 100(1 - \alpha)\% is given by:

K = t(n-1, 1 - \alpha, z_\beta \sqrt{n}) / \sqrt{n}

where t(\nu, p, \delta) denotes the p'th quantile of a non-central t-distribution with \nu degrees of freedom and noncentrality parameter \delta (see the help file for TDist), and z_p denotes the p'th quantile of a standard normal distribution.

Two-Sided Case

When ti.type="two-sided" and method="exact", the exact formula for the constant K for a 100 \beta \% \beta-content tolerance interval with associated confidence level 100(1-\alpha)\% requires numerical integration and has been derived by several different authors, including Odeh (1978), Eberhardt et al. (1989), Jilek (1988), Fujino (1989), and Janiga and Miklos (2001). Specifically, for given values of the sample size n, degrees of freedom \nu, confidence level (1-\alpha), and coverage \beta, the constant K is the solution to the equation:

\sqrt{\frac{n}{2 \pi}} \, \int^\infty_{-\infty} {F(x, K, \nu, R) \, e^{(-nx^2)/2}} \, dx = 1 - \alpha

where F(x, K, \nu, R) denotes the upper-tail area from (\nu \, R^2) / K^2 to \infty of the chi-squared distribution with \nu degrees of freedom, and R is the solution to the equation:

\Phi (x + R) - \Phi (x - R) = \beta

where \Phi() denotes the standard normal cumulative distribuiton function.

When ti.type="two-sided" and method="wald.wolfowitz", the approximate formula due to Wald and Wolfowitz (1946) for the constant K for a 100 \beta \% \beta-content tolerance interval with associated confidence level 100(1-\alpha)\% is given by:

K \approx r \, u

where r is the solution to the equation:

\Phi (\frac{1}{\sqrt{n}} + r) - \Phi (\frac{1}{\sqrt{n}} - r) = \beta

\Phi () denotes the standard normal cumulative distribuiton function, and u is given by:

u = \sqrt{\frac{n-1}{\chi^{2} (n-1, \alpha)}}

where \chi^{2} (\nu, p) denotes the p'th quantile of the chi-squared distribution with \nu degrees of freedom.

The Derivation of K for a \beta-Expectation Tolerance Interval

As stated above, a \beta-expectation tolerance interval with coverage 100 \beta \% is equivalent to a prediction interval for one future observation with associated confidence level 100 \beta \%. This is because the probability that any single future observation will fall into this interval is 100 \beta \%, so the distribution of the number of N future observations that will fall into this interval is binomial with parameters size = N and prob = \beta (see the help file for Binomial). Hence the expected proportion of future observations that will fall into this interval is 100 \beta \% and is independent of the value of N. See the help file for predIntNormK for information on how to derive K for these intervals.

Value

The value of K, a numeric scalar used to construct tolerance intervals for a normal (Gaussian) distribution.

Note

Tabled values of K are given in Gibbons et al. (2009), Gilbert (1987), Guttman (1970), Krishnamoorthy and Mathew (2009), Owen (1962), Odeh and Owen (1980), and USEPA (2009).

Tolerance intervals have long been applied to quality control and life testing problems (Hahn, 1970b,c; Hahn and Meeker, 1991; Krishnamoorthy and Mathew, 2009). References that discuss tolerance intervals in the context of environmental monitoring include: Berthouex and Brown (2002, Chapter 21), Gibbons et al. (2009), Millard and Neerchal (2001, Chapter 6), Singh et al. (2010b), and USEPA (2009).

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Berthouex, P.M., and L.C. Brown. (2002). Statistics for Environmental Engineers. Lewis Publishers, Boca Raton.

Draper, N., and H. Smith. (1998). Applied Regression Analysis. Third Edition. John Wiley and Sons, New York.

Eberhardt, K.R., R.W. Mee, and C.P. Reeve. (1989). Computing Factors for Exact Two-Sided Tolerance Limits for a Normal Distribution. Communications in Statistics, Part B-Simulation and Computation 18, 397-413.

Ellison, B.E. (1964). On Two-Sided Tolerance Intervals for a Normal Distribution. Annals of Mathematical Statistics 35, 762-772.

Fujino, T. (1989). Exact Two-Sided Tolerance Limits for a Normal Distribution. Japanese Journal of Applied Statistics 18, 29-36.

Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring, Second Edition. John Wiley & Sons, Hoboken.

Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, New York.

Guttman, I. (1970). Statistical Tolerance Regions: Classical and Bayesian. Hafner Publishing Co., Darien, CT.

Hahn, G.J. (1970b). Statistical Intervals for a Normal Population, Part I: Tables, Examples and Applications. Journal of Quality Technology 2(3), 115-125.

Hahn, G.J. (1970c). Statistical Intervals for a Normal Population, Part II: Formulas, Assumptions, Some Derivations. Journal of Quality Technology 2(4), 195-206.

Hahn, G.J., and W.Q. Meeker. (1991). Statistical Intervals: A Guide for Practitioners. John Wiley and Sons, New York.

Jilek, M. (1988). Statisticke Tolerancni Meze. SNTL, Praha.

Krishnamoorthy K., and T. Mathew. (2009). Statistical Tolerance Regions: Theory, Applications, and Computation. John Wiley and Sons, Hoboken.

Janiga, I., and R. Miklos. (2001). Statistical Tolerance Intervals for a Normal Distribution. Measurement Science Review 11, 29-32.

Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton.

Odeh, R.E. (1978). Tables of Two-Sided Tolerance Factors for a Normal Distribution. Communications in Statistics, Part B-Simulation and Computation 7, 183-201.

Odeh, R.E., and D.B. Owen. (1980). Tables for Normal Tolerance Limits, Sampling Plans, and Screening. Marcel Dekker, New York.

Owen, D.B. (1962). Handbook of Statistical Tables. Addison-Wesley, Reading, MA.

Singh, A., R. Maichle, and N. Armbya. (2010a). ProUCL Version 4.1.00 User Guide (Draft). EPA/600/R-07/041, May 2010. Office of Research and Development, U.S. Environmental Protection Agency, Washington, D.C.

Singh, A., N. Armbya, and A. Singh. (2010b). ProUCL Version 4.1.00 Technical Guide (Draft). EPA/600/R-07/041, May 2010. Office of Research and Development, U.S. Environmental Protection Agency, Washington, D.C.

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.

USEPA. (2010). Errata Sheet - March 2009 Unified Guidance. EPA 530/R-09-007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C.

Wald, A., and J. Wolfowitz. (1946). Tolerance Limits for a Normal Distribution. Annals of Mathematical Statistics 17, 208-215.

Examples

  # Compute the value of K for a two-sided 95% beta-content 
  # tolerance interval with associated confidence level 95% 
  # given a sample size of n=20.

  #----------
  # Exact method

  tolIntNormK(n = 20)
  #[1] 2.760346

  #----------
  # Approximate method due to Wald and Wolfowitz (1946)

  tolIntNormK(n = 20, method = "wald")
  # [1] 2.751789


  #--------------------------------------------------------------------

  # Compute the value of K for a one-sided upper tolerance limit 
  # with 99% coverage and associated confidence level 90% 
  # given a samle size of n=20.

  tolIntNormK(n = 20, ti.type = "upper", coverage = 0.99, 
    conf.level = 0.9)
  #[1] 3.051543

  #--------------------------------------------------------------------

  # Example 17-3 of USEPA (2009, p. 17-17) shows how to construct a 
  # beta-content upper tolerance limit with 95% coverage and 95% 
  # confidence  using chrysene data and assuming a lognormal 
  # distribution.  The sample size is n = 8 observations from 
  # the two compliance wells.  Here we will compute the 
  # multiplier for the log-transformed data.

  tolIntNormK(n = 8, ti.type = "upper")
  #[1] 3.187294

[Package EnvStats version 2.8.1 Index]

Compute the Value of K for a Tolerance Interval for a Normal Distribution