R: Expected Value of Order Statistics for Random Sample from...

evNormOrdStats {EnvStats}

R Documentation

Expected Value of Order Statistics for Random Sample from Standard Normal Distribution

Description

Compute the expected values of order statistics for a random sample from a standard normal distribution.

Usage

  evNormOrdStats(n = 1, 
    method = "royston", lower = -9, inc = 0.025, warn = TRUE, 
    alpha = 3/8, nmc = 2000, seed = 47, approximate = NULL)

  evNormOrdStatsScalar(r = 1, n = 1, 
    method = "royston", lower = -9, inc = 0.025, warn = TRUE, 
    alpha = 3/8, nmc = 2000, conf.level = 0.95, seed = 47, approximate = NULL)

Arguments

`n`	positive integer indicating the sample size.
`r`	positive integer between `1` and `n` specifying the order statistic for which to compute the expected value.
`method`	character string indicating what method to use. The possible values are: `"royston"`. Method based on approximating the exact integral as given in Royston (1982). `"blom"`. Method based on the approximation formula proposed by Blom (1958). `"mc"`. Method based on Monte Carlo simulation. See the DETAILS section below.
`lower`	numeric scalar `\le -9` defining the lower bound used for approximating the integral when `method="royston"`. The upper bound is automatically set to `-lower`. The default value is `lower=-9`.
`inc`	numeric scalar between `.Machine$double.eps` and `0.025` that determines the width of each subdivision used to approximate the integral when `method="royston"`. The default value is `inc=0.025`.
`warn`	logical scalar indicating whether to issue a warning when `method="royston"` and the sample size is greater than 2000. The default value is `warn=TRUE`.
`alpha`	numeric scalar between 0 and 0.5 that determines the constant used when `method="blom"`. The default value is `alpha=3/8`.
`nmc`	integer `\ge 100` denoting the number of Monte Carlo simulations to use when `method="mc"`. The default value is `nmc=2000`.
`conf.level`	numeric scalar between 0 and 1 denoting the confidence level of the confidence interval for the expected value of the normal order statistic when `method="mc"`. The default value is `conf.level=0.95`.
`seed`	integer between `-(2^31 - 1)` and `2^31 - 1` specifying the argument to `set.seed` (the random number seed) when `method="mc"`. The default value is `seed=47`.
`approximate`	logical scalar included for backwards compatibility with versions of EnvStats prior to version 2.3.0. When `method` is not supplied and `approximate=FALSE`, `method` is set to `method="royston"`. When `method` is not supplied and `approximate=TRUE`, `method` is set to `method="blom"`. This argument is ignored if `method` is supplied and/or `approxmiate=NULL` (the default).

Details

Let \underline{z} = z_1, z_2, \ldots, z_n denote a vector of n observations from a normal distribution with parameters mean=0 and sd=1. That is, \underline{z} denotes a vector of n observations from a standard normal distribution. Let z_{(r)} denote the r'th order statistic of \underline{z}, for r = 1, 2, \ldots, n. The probability density function of z_{(r)} is given by:

f_{r,n}(t) = \frac{n!}{(r-1)!(n-r)!} [\Phi(t)]^{r-1} [1 - \Phi(t)]^{n-r} \phi(t) \;\;\;\;\;\; (1)

where \Phi and \phi denote the cumulative distribution function and probability density function of the standard normal distribution, respectively (Johnson et al., 1994, p.93). Thus, the expected value of z_{(r)} is given by:

E(r, n) = E[z_{(r)}] = \int_{-\infty}^{\infty} t f_{r,n}(t) dt \;\;\;\;\;\; (2)

It can be shown that if n is odd, then

E[(n+1)/2, n] = 0 \;\;\;\;\;\; (3)

Also, for all values of n,

E(r, n) = -E(n-r+1, n) \;\;\;\;\;\; (4)

The function evNormOrdStatsScalar computes the value of E(r,n) for user-specified values of r and n.

The function evNormOrdStats computes the values of E(r,n) for all values of r (i.e., for r = 1, 2, \ldots, n) for a user-specified value of n.

Exact Method Based on Royston's Approximation to the Integral (method="royston")

When method="royston", the integral in Equation (2) above is approximated by computing the value of the integrand between the values of lower and -lower using increments of inc, then summing these values and multiplying by inc. In particular, the integrand is restructured as:

t \; f_{r,n}(t) = t \; exp\{log(n!) - log[(r-1)!] - log[(n-r)!] + (r-1)log[\Phi(t)] + (n-r)log[1 - \Phi(t)] + log[\phi(t)]\} \;\;\; (5)

By default, as per Royston (1982), the integrand is evaluated between -9 and 9 in increments of 0.025. The approximation is computed this way for values of r between 1 and [n/2], where [x] denotes the floor of x. If r > [n/2], then the approximation is computed for E(n-r+1, n) and Equation (4) is used.

Note that Equation (1) in Royston (1982) differs from Equations (1) and (2) above because Royston's paper is based on the r^{th} largest value, not the r^{th} order statistic.

Royston (1982) states that this algorithm “is accurate to at least seven decimal places on a 36-bit machine,” that it has been validated up to a sample size of n=2000, and that the accuracy for n > 2000 may be improved by reducing the value of the argument inc. Note that making inc smaller will increase the computation time.

Approxmation Based on Blom's Method (method="blom")

When method="blom", the following approximation to E(r,n), proposed by Blom (1958, pp. 68-75), is used:

E(r, n) \approx \Phi^{-1}(\frac{r - \alpha}{n - 2\alpha + 1}) \;\;\;\;\;\; (5)

By default, \alpha = 3/8 = 0.375. This approximation is quite accurate. For example, for n \ge 2, the approximation is accurate to the first decimal place, and for n \ge 9 it is accurate to the second decimal place.

Harter (1961) discusses appropriate values of \alpha for various sample sizes n and values of r.

Approximation Based on Monte Carlo Simulation (method="mc")

When method="mc", Monte Carlo simulation is used to estmate the expected value of the r^{th} order statistic. That is, N = nmc trials are run in which, for each trial, a random sample of n standard normal observations is generated and the r^{th} order statistic is computed. Then, the average value of this order statistic over all N trials is computed, along with a confidence interval for the expected value, assuming an approximately normal distribution for the mean of the order statistic (the confidence interval is computed by supplying the simulated values of the r^{th} order statistic to the function enorm).

NOTE: This method has not been optimized for large sample sizes n (i.e., large values of the argument n) and/or a large number of Monte Carlo trials N (i.e., large values of the argument nmc) and may take a long time to execute in these cases.

Value

For evNormOrdStats: a numeric vector of length n containing the expected values of all the order statistics for a random sample of n standard normal deviates.

For evNormOrdStatsScalar: a numeric scalar containing the expected value of the r'th order statistic from a random sample of n standard normal deviates. When method="mc", the returned object also has a cont.int attribute that contains the 95 and a nmc attribute indicating the number of Monte Carlo trials run.

Note

The expected values of normal order statistics are used to construct normal quantile-quantile (Q-Q) plots (see qqPlot) and to compute goodness-of-fit statistics (see gofTest). Usually, however, approximations are used instead of exact values. The functions evNormOrdStats and
evNormOrdStatsScalar have been included mainly because evNormOrdStatsScalar is called by elnorm3 and predIntNparSimultaneousTestPower.

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Blom, G. (1958). Statistical Estimates and Transformed Beta Variables. John Wiley and Sons, New York.

Harter, H. L. (1961). Expected Values of Normal Order Statistics 48, 151–165.

Johnson, N. L., S. Kotz, and N. Balakrishnan. (1994). Continuous Univariate Distributions, Volume 1. Second Edition. John Wiley and Sons, New York, pp. 93–99.

Royston, J.P. (1982). Algorithm AS 177. Expected Normal Order Statistics (Exact and Approximate). Applied Statistics 31, 161–165.

Examples

  # Compute the expected value of the minimum for a random sample of size 10 
  # from a standard normal distribution:

  # Based on method="royston"
  #--------------------------
  evNormOrdStatsScalar(r = 1, n = 10) 
  #[1] -1.538753


  # Based on method="blom"
  #-----------------------
  evNormOrdStatsScalar(r = 1, n = 10, method = "blom") 
  #[1] -1.546635


  # Based on method="mc" with 10,000 Monte Carlo trials
  #----------------------------------------------------
  evNormOrdStatsScalar(r = 1, n = 10, method = "mc", nmc = 10000) 
  #[1] -1.544318
  #attr(,"confint")
  #   95%LCL    95%UCL 
  #-1.555838 -1.532797 
  #attr(,"nmc")
  #[1] 10000

  #====================

  # Compute the expected values of all of the order statistics 
  # for a random sample of size 10 from a standard normal distribution
  # based on Royston's (1982) method:
  #--------------------------------------------------------------------

  evNormOrdStats(10) 
  #[1] -1.5387527 -1.0013570 -0.6560591 -0.3757647 -0.1226678
  #[6]  0.1226678  0.3757647  0.6560591  1.0013570  1.5387527


  # Compare the above with Blom (1958) scores:
  #-------------------------------------------

  evNormOrdStats(10, method = "blom") 
  #[1] -1.5466353 -1.0004905 -0.6554235 -0.3754618 -0.1225808
  #[6]  0.1225808  0.3754618  0.6554235  1.0004905  1.5466353

[Package EnvStats version 2.8.1 Index]