R: P-value for the Rth Order Statistic

RthOrderPValueOrthoT {MGBT}

R Documentation

P-value for the Rth Order Statistic

Description

Compute the p-value for the rth order statistic

\eta(r,n) = \frac{x_{[r:n]} - \mathrm{mean}\{x_{[(r+1)\rightarrow n:n]}\}} {\sqrt{\mathrm{var}\{x_{[(r+1)\rightarrow n:n]}\}}}\mbox{.}

This function is the cumulative distribution function of the Grubbs–Beck statistic (eta = GB_r(p)). In distribution notation, this is equivalent to saying F(GB_r) for nonexceedance probability
F \in (0,1). The inverse or quantile function GB_r(F) is CritK.

Usage

RthOrderPValueOrthoT(n, r, eta, n.sim=10000, silent=TRUE)

Arguments

`n`	The number of observations;
`r`	The number of truncated observations; and
`eta`	The pseudo-studentized magnitude of `r`th smallest observation;
`n.sim`	The sample size to attempt a Monte Carlo integration in case the numerical integration via `integrate()` encounters a divergent integral; and
`silent`	A logical controlling the silence of `try`.

Value

The value a two-column R matrix.

Note

The extension to Monte Carlo integration in event of failure of the numerical integration an extension is by WHA. The Note for MGBT provides extensive details in the context of a practical application.

Note that in conjunction with RthOrderPValueOrthoT, TAC provided an enhanced numerical integration interface (integrateV()) to integrate() built-in to R. In fact, all that TAC did was wrap a vectorization scheme using sapply() on top of peta. The issue is that peta was not designed to be vectorized. WHA has simply inserted the sapply R idiom inside peta and hence vectorizing it and removed the need in the MGBT package for the integrateV() function in the TAC sources.

TAC named this function with the Kth order. In code, however, TAC uses the variable r. WHA has migrated all references to Kth to Rth for systematic consistency. Hence, this function has been renamed to RthOrderPValueOrthoT.

TAC also provides a KthOrderPValueOrthoTb function and notes that it employs simple Gaussian quadrature to compute the integral much more quickly. However, it is slightly less accurate for tail probabilities. The Gaussian quadrature is from a function gauss.quad.prob(), which seems to not be found in the TAC sources given to WHA.

Author(s)

W.H. Asquith consulting T.A. Cohn sources

Source

LowOutliers_jfe(R).txt, LowOutliers_wha(R).txt, P3_089(R).txt—
Named KthOrderPValueOrthoT + KthOrderPValueOrthoTb

References

Cohn, T.A., 2013–2016, Personal communication of original R source code: U.S. Geological Survey, Reston, Va.

Examples

# Running next line without the $value will show:
#0.001000002 with absolute error < 1.7e-05 # This is output from the integrate()
# function, which means that the numerical integration worked.
RthOrderPValueOrthoT(58, 2, -3.561143)$value


# Long CPU time
CritK(58, 2, RthOrderPValueOrthoT(58, 2, -3.561143)$value)
#[1] -3.561143  # Therefore CritK() is the inverse of this function.


# Long CPU time
# Monte Carlo distribution of rth pseudo-studentized order statistic (TAC note)
testRthOrderPValueOrthoT <- function(nrep=1E4, r=2, n=100,
               test_quants = c(0.05,0.1,0.5,0.9,0.95),  ndigits=3, seed=1) {
   set.seed(seed)
   z <- replicate(nrep, { x <- sort(rnorm(n)); xr <- x[r]; x2 <- x[(r+1):n]
                         (xr - mean(x2))/sqrt(var(x2)) })
     res <- sapply(quantile(z, test_quants), function(q) {
                 c(q, RthOrderPValueOrthoT(n,r,q)$value) })
   round(res,ndigits)
}

nsim <- 1E4
for(n in 50) {   # original TAC sources had c(10,15,25,50,100,500)
   for(r in 5) { # original TAC sources had 1:min(10,floor(n/2))
      message("n=",n, " and r=",r)
      print(testRthOrderPValueOrthoT(nrep=nsim, n=n, r=r))
   }
}
# Output like this will be seen
# n=50 and r=5
#         5%    10%    50%    90%    95%
#[1,] -2.244 -2.127 -1.788 -1.523 -1.460
#[2,]  0.046  0.096  0.499  0.897  0.946
# that shows simulated percentages near the theoretical

# To get the MSE of the results (TAC note). See WHA note on a change below and
# it is suspected that TAC's "tests" might have been fluid in the sense that
# he would modify as needed and did not fully design as Examples for end users.
rr <- rep(0,10)
for(n in 50) {   # original TAC sources had c(10,15,25,50,100,500)
   for(r in 5) { # original TAC sources had 1:min(10,floor(n/2))
      message("n=",n, " and r=",r)
      for(i in 1:10) { # The [1,1] is WHA addition to get function to run.
         # extract the score for the 5% level
         rr[i] <- testRthOrderPValueOrthoT(nrep=nsim, n=n, r=r, seed=i)[1,1]
      }
      message("var (MSE):", sqrt(var(rr/100)))
   }
}
# Output like this will be seen
# n=50 and r=5
# var (MSE):6.915361322608e-05 


#  Long CPU time
#  Monte Carlo computation of critical values for special cases (TAC note)
CritValuesMC <-
function(nrep=50, kvs=c(1,3,0.25,0.5), n=100, ndigits=3, seed=1,
         test_quants=c(0.01,0.10,0.50)) {
   set.seed(seed)
   k_values <- ifelse(kvs >= 1, kvs, ceiling(n*kvs))
   z  <- replicate(nrep, {
      x <- sort(rnorm(n))
      sapply(k_values, function(r) {
          xr <- x[r]; x2 <- x[(r+1):n]
         (xr-mean(x2)) / sqrt(var(x2)) })  })
   res <- round(apply(z, MARGIN=1, quantile, test_quants), ndigits)
   colnames(res) <- k_values; return(res)
}

# TAC example. Note that z acquires its square dimension from test_quants
# but Vr is used in the sapply(). WHA has reset Vr to
n=100; nrep=10000; test_quants=c(.05,.5,1); Vr=1:10 # This Vr by TAC
z <- CritValuesMC(n=n, nrep=nrep, test_quants=test_quants)
Vr <- 1:length(z[,1]) # WHA reset of Vr to use TAC code below. TAC Vr bug?
HH <- sapply(Vr, function(r) RthOrderPValueOrthoT(n, r, z[1,r])$value)
TT <- sapply(Vr, function(r) RthOrderPValueOrthoT(n, r, z[2,r])$value) #

[Package MGBT version 1.0.7 Index]