R: Probability That at Least One Future Observation Falls...

predIntNormTestPower {EnvStats}

R Documentation

Probability That at Least One Future Observation Falls Outside a Prediction Interval for a Normal Distribution

Description

Compute the probability that at least one out of k future observations (or means) falls outside a prediction interval for k future observations (or means) for a normal distribution.

Usage

  predIntNormTestPower(n, df = n - 1, n.mean = 1, k = 1, delta.over.sigma = 0, 
    pi.type = "upper", conf.level = 0.95)

Arguments

`n`	vector of positive integers greater than 2 indicating the sample size upon which the prediction interval is based.
`df`	vector of positive integers indicating the degrees of freedom associated with the sample size. The default value is `df=n-1`.
`n.mean`	positive integer specifying the sample size associated with the future averages. The default value is `n.mean=1` (i.e., individual observations). Note that all future averages must be based on the same sample size.
`k`	vector of positive integers specifying the number of future observations that the prediction interval should contain with confidence level `conf.level`. The default value is `k=1`.
`delta.over.sigma`	vector of numbers indicating the ratio `\Delta/\sigma`. The quantity `\Delta` (delta) denotes the difference between the mean of the population that was sampled to construct the prediction interval, and the mean of the population that will be sampled to produce the future observations. The quantity `\sigma` (sigma) denotes the population standard deviation for both populations. See the DETAILS section below for more information. The default value is `delta.over.sigma=0`.
`pi.type`	character string indicating what kind of prediction interval to compute. The possible values are `pi.type="upper"` (the default), and `pi.type="lower"`.
`conf.level`	numeric vector of values between 0 and 1 indicating the confidence level of the prediction interval. The default value is `conf.level=0.95`.

Details

What is a Prediction Interval?
A prediction interval for some population is an interval on the real line constructed so that it will contain k future observations or averages from that population with some specified probability (1-\alpha)100\%, where 0 < \alpha < 1 and k is some pre-specified positive integer. The quantity (1-\alpha)100\% is call the confidence coefficient or confidence level associated with the prediction interval. The function predIntNorm computes a standard prediction interval based on a sample from a normal distribution. The function predIntNormTestPower computes the probability that at least one out of k future observations or averages will not be contained in the prediction interval, where the population mean for the future observations is allowed to differ from the population mean for the observations used to construct the prediction interval.

The Form of a Prediction Interval
Let \underline{x} = x_1, x_2, \ldots, x_n denote a vector of n observations from a normal distribution with parameters mean=\mu and sd=\sigma. Also, let m denote the sample size associated with the k future averages (i.e., n.mean=m). When m=1, each average is really just a single observation, so in the rest of this help file the term “averages” will replace the phrase “observations or averages”.

For a normal distribution, the form of a two-sided (1-\alpha)100\% prediction interval is:

[\bar{x} - Ks, \bar{x} + Ks] \;\;\;\;\;\; (1)

where \bar{x} denotes the sample mean:

\bar{x} = \frac{1}{n} \sum_{i=1}^n x_i \;\;\;\;\;\; (2)

s denotes the sample standard deviation:

s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2 \;\;\;\;\;\; (3)

and K denotes a constant that depends on the sample size n, the confidence level, the number of future averages k, and the sample size associated with the future averages, m. Do not confuse the constant K (uppercase K) with the number of future averages k (lowercase k). The symbol K is used here to be consistent with the notation used for tolerance intervals (see tolIntNorm).

Similarly, the form of a one-sided lower prediction interval is:

[\bar{x} - Ks, \infty] \;\;\;\;\;\; (4)

and the form of a one-sided upper prediction interval is:

[-\infty, \bar{x} + Ks] \;\;\;\;\;\; (5)

but K differs for one-sided versus two-sided prediction intervals. The derivation of the constant K is explained in the help file for predIntNormK.

Computing Power
The "power" of the prediction interval is defined as the probability that at least one out of the k future observations or averages will not be contained in the prediction interval, where the population mean for the future observations is allowed to differ from the population mean for the observations used to construct the prediction interval. The probability p that all k future observations will be contained in a one-sided upper prediction interval (pi.type="upper") is given in Equation (6) of the help file for predIntNormSimultaneousK, where k=m and r=1:

p = \int_0^1 T(\sqrt{n}K; n-1, \sqrt{n}[\Phi^{-1}(v) + \frac{\Delta}{\sigma}]) [\frac{v^{k-1}}{B(k, 1)}] dv \;\;\;\;\;\; (6)

where T(x; \nu, \delta) denotes the cdf of the non-central Student's t-distribution with parameters df=\nu and ncp=\delta evaluated at x; \Phi(x) denotes the cdf of the standard normal distribution evaluated at x; and B(\nu, \omega) denotes the value of the beta function with parameters a=\nu and b=\omega.

The quantity \Delta (upper case delta) denotes the difference between the mean of the population that was sampled to construct the prediction interval, and the mean of the population that will be sampled to produce the future observations. The quantity \sigma (sigma) denotes the population standard deviation of both of these populations. Usually you assume \Delta=0 unless you are interested in computing the power of the rule to detect a change in means between the populations, as we are here.

If we are interested in using averages instead of single observations, with w \ge 1 (i.e., n.mean\ge 1), the first term in the integral in Equation (6) that involves the cdf of the non-central Student's t-distribution becomes:

T(\sqrt{n}K; n-1, \frac{\sqrt{n}}{\sqrt{w}}[\Phi^{-1}(v) + \frac{\sqrt{w}\Delta}{\sigma}]) \;\;\;\;\;\; (7)

For a given confidence level (1-\alpha)100\%, the power of the rule to detect a change in means is simply given by:

Power = 1 - p \;\;\;\;\;\; (8)

where p is defined in Equation (6) above using the value of K that corresponds to \Delta/\sigma = 0. Thus, when the argument delta.over.sigma=0, the value of p is 1-\alpha and the power is simply \alpha 100\%. As delta.over.sigma increases above 0, the power increases.

When pi.type="lower", the same value of K is used as when pi.type="upper", but Equation (4) is used to construct the prediction interval. Thus, the power increases as delta.over.sigma decreases below 0.

Value

vector of values between 0 and 1 equal to the probability that at least one of k future observations or averages will fall outside the prediction interval.

Note

See the help files for predIntNorm and predIntNormSimultaneous.

In the course of designing a sampling program, an environmental scientist may wish to determine the relationship between sample size, significance level, power, and scaled difference if one of the objectives of the sampling program is to determine whether two distributions differ from each other. The functions predIntNormTestPower and plotPredIntNormTestPowerCurve can be used to investigate these relationships for the case of normally-distributed observations. In the case of a simple shift between the two means, the test based on a prediction interval is not as powerful as the two-sample t-test. However, the test based on a prediction interval is more efficient at detecting a shift in the tail.

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References