R: Simultaneous Tolerance Bands (STB).

stb.default {STB}

R Documentation

Simultaneous Tolerance Bands (STB).

Description

Compute And/Or Plot Simultaneous Tolerance Bands for numeric vectors.

Usage

## Default S3 method:
stb(
  obj,
  N = 10000L,
  alpha = 0.05,
  rand.func = rnorm,
  tol = 1e-04,
  max.iter = 100L,
  algo = c("rank", "C", "R"),
  Ncpu = 1,
  q.type = 2L,
  stb.col = "#0000FF40",
  col.points = "black",
  col.out = "red",
  col.pwb = "#0000FF40",
  main = NULL,
  add.pwb = FALSE,
  quiet = FALSE,
  add = FALSE,
  plot = TRUE,
  legend = FALSE,
  timer = FALSE,
  pch = 16,
  pch.out = 16,
  seed = NULL,
  ...
)

Arguments

`obj`	(numeric) vector, which is supposed to be N(my, sigma^2)-distributed
`N`	(integer) value specifying the number of random samples to be used for constructing the STB
`alpha`	(numeric) value specifying the simultaneous tolerance level, i.e. 100(1-alpha)% of all 'N' random samples have to be completely enclosed by the bounds of the STB
`rand.func`	(function) a function which generates random samples, e.g. `rand.func=rnorm` which corresponds to random sampling from the standard normal distribution. Another example is defining func=function(n)rchisq(n=n, df=3, ncp=2) and using `rand.func=func`. See examples for further examples.
`tol`	(numeric) value specifying the max. acceptable deviation from 'alpha' used in the bisection algorithm
`max.iter`	(integer) value specifying the max. number of iteration for finding the bounds of the STB
`algo`	(character) (string) specifying the method to be used for constructing a 100(1-alpha)% STB, choose "rank" for the rank-based, "C" for a C-implementation of the quantile-based, and "R" for an R-implentation of the quantile-based algorithm (see details). "C" uses SAS PCTLDEF5 definition of quantiles, whereas "R" can use any of the built-in R types of quantiles (see `quantile`.
`Ncpu`	(integer) specifying the number cores/CPUs to be used, for N>1 multi-processing is applied
`q.type`	(integer) the quantile-type used if `algo="R"`, see ? quantile for details.
`stb.col`	(character) string, a valid specification of a color to be used for the STB
`col.points`	(character) color for the points in the QQ-plot
`col.out`	(character) color for points outsied of the 100(1-alpha)% STB
`col.pwb`	(character) color for the point-wise STB (not adjusted for multiplicity), defaults to "#0000FF40" which is "blue" with 80% transparency
`main`	(characer) string for a main title appearing over the plot
`add.pwb`	(logical) should the point-wise tolerance band be plotted for comparison?
`quiet`	(logical) TRUE = no additional output ist printed (progress bar etc.)
`add`	(logical) TRUE = the 100(1-alpha)% STB is added to an existing plot
`plot`	(logical) TRUE = either a QQ-plot with STB (add=FALSE) or a STB added to an existing plot (add=TRUE) is plotted. FALSE = only computations are carried out without plotting, an object of class 'STB' is returned which can be stored an plotted later on, e.g. to avoid computing an STB every time a Sweave/mWeave report is updated
`legend`	(logical) TRUE a legend is plotted "topleft"
`timer`	(logical) TRUE = the time spent for computing the STB will be printed
`pch`	(integer) plotting symbols for the QQ-plot
`pch.out`	(integer) plotting symbols for outlying points
`seed`	(numeric) value interpreted as integer, setting the random number generator (RNG) to a defined state
`...`	further graphical parameters passed on

Details

Function takes a numeric vector 'vec' and computes the 100(1-alpha)%-simultaneous tolerance band (STB) for the (DEFAULT )Null-hypothesis H0: vec~N(my, sigma^2) distributed, which is equal to checking whether the residuals of the simplest linear model y = mu + e (y~1) are normally distributed, i.e. 'e ~ N(0, sigma^2)'. By specification of argument rand.func other null-distributions can be specified. One has to specify a function with a single argument 'n', which returns a random sample with 'n' observations, randomly drawn from the desired null-distribution (see description argument rand.func below). Note that all random samples as well as vector vec will be centered to mean=0 and scaled to sd=1.

One can choose between three methods for constructing the 100(1-alpha)% STB. There are two implementations of the quantile-based algorithm ("C", "R" see 1st reference) and one of the rank-based algorithm (see 2nd reference). Methods "C" and "R" can be run in parallel. The rank-based algorithm does not benefit form parallel processing, at least not in the current implementation. It is still the default and recommended for small to medium sized vectors and 10000 <= N <= 20000 simulations, which should be sufficiently accurate reflect the null-distribution. The "C" and "R" options refer to implementations of the quantile-based algorithm. The "C" implementation benefits most from using multiple cores, i.e. the larger 'Ncpu' the better, and should be used for large problems, i.e. rather large number of elements and large number of simulations.

The table below gives an impression how these algorithms perform. Runtimes were measured under Windows 7 on a Intel Xeon E5-2687W 3.1 GHz workstation with 16 logical cores and 16 GB RAM. The runtime of the C-implementation of the quantile-based algorithm is denoted as "t_qC12" applied parallely with 12 cores. Each setting was repeated 10 times and the overall run time was then divided by 10 providing sufficiently robust simulation results. Note, that for smaller problem sizes a large proportion of the overall runtime is due to simulating, i.e. drawing from the null-distribution.

_____N_obs	_____N_sim	____t_rank	____t_qC12
25	5000	0.4s	0.5s
25	10000	0.8s	1.3s
50	10000	1.0s	3.2s
100	10000	1.7s	2.9s
100	20000	3.0s	4.8s
225	20000	5.1s	8.3s
300	30000	9.6s	17.2s
300	50000	16.1s	24.9s
1000	50000	47.8s	123.5s

Value

invisibly returns a list-type object of class STB, which comprises all arguments accepted by this function.

Author(s)

Andre Schuetzenmeister andre.schuetzenmeister@roche.com

References

Schuetzenmeister, A., Jensen, U., Piepho, H.P. (2011), Checking assumptions of normality and homoscedasticity in the general linear model. Communications in Statistics - Simulation and Computation; S. 141-154

Schuetzenmeister, A. and Piepho, H.P. (2012). Residual analysis of linear mixed models using a simulation approach. Computational Statistics and Data Analysis, 56, 1405-1416

Examples

### log-normal vector to be checked for normality
## Not run: 
set.seed(111)
stb(exp(rnorm(30)), col.out="red", legend=TRUE)

### uniformly distributed sample checked for Chi-Squared Distribution with DF=1, degrees of freedom
set.seed(707)
stb(runif(25, -5, 5), rand.func=function(n){rchisq(n=n, df=1)}, 
    col.out="red", legend=TRUE, main="Chi-Squared with DF=1")

### check whether an Chi-Squared (DF=1) random sample better fits 
stb(rchisq(25, df=1), rand.func=function(n){rchisq(n=n, df=1)}, 
    col.out="red", legend=TRUE, main="Chi-Squared with DF=1")

### add STB to an existing plot
plot(sort(rnorm(30)), sort(rnorm(30)))
stb(rnorm(30), add=TRUE)

### compute STB for later use and prevent plotting
STB <- stb(rnorm(30), plot=FALSE)

## End(Not run)

[Package STB version 0.6.5 Index]