stb.default {STB} | R Documentation |
Simultaneous Tolerance Bands (STB).
Description
Compute And/Or Plot Simultaneous Tolerance Bands for numeric vectors.
Usage
## Default S3 method:
stb(
obj,
N = 10000L,
alpha = 0.05,
rand.func = rnorm,
tol = 1e-04,
max.iter = 100L,
algo = c("rank", "C", "R"),
Ncpu = 1,
q.type = 2L,
stb.col = "#0000FF40",
col.points = "black",
col.out = "red",
col.pwb = "#0000FF40",
main = NULL,
add.pwb = FALSE,
quiet = FALSE,
add = FALSE,
plot = TRUE,
legend = FALSE,
timer = FALSE,
pch = 16,
pch.out = 16,
seed = NULL,
...
)
Arguments
obj |
(numeric) vector, which is supposed to be N(my, sigma^2)-distributed |
N |
(integer) value specifying the number of random samples to be used for constructing the STB |
alpha |
(numeric) value specifying the simultaneous tolerance level, i.e. 100(1-alpha)% of all 'N' random samples have to be completely enclosed by the bounds of the STB |
rand.func |
(function) a function which generates random samples, e.g. |
tol |
(numeric) value specifying the max. acceptable deviation from 'alpha' used in the bisection algorithm |
max.iter |
(integer) value specifying the max. number of iteration for finding the bounds of the STB |
algo |
(character) (string) specifying the method to be used for constructing a 100(1-alpha)% STB,
choose "rank" for the rank-based, "C" for a C-implementation of the quantile-based, and
"R" for an R-implentation of the quantile-based algorithm (see details). "C" uses SAS PCTLDEF5 definition of quantiles,
whereas "R" can use any of the built-in R types of quantiles (see |
Ncpu |
(integer) specifying the number cores/CPUs to be used, for N>1 multi-processing is applied |
q.type |
(integer) the quantile-type used if |
stb.col |
(character) string, a valid specification of a color to be used for the STB |
col.points |
(character) color for the points in the QQ-plot |
col.out |
(character) color for points outsied of the 100(1-alpha)% STB |
col.pwb |
(character) color for the point-wise STB (not adjusted for multiplicity), defaults to "#0000FF40" which is "blue" with 80% transparency |
main |
(characer) string for a main title appearing over the plot |
add.pwb |
(logical) should the point-wise tolerance band be plotted for comparison? |
quiet |
(logical) TRUE = no additional output ist printed (progress bar etc.) |
add |
(logical) TRUE = the 100(1-alpha)% STB is added to an existing plot |
plot |
(logical) TRUE = either a QQ-plot with STB (add=FALSE) or a STB added to an existing plot (add=TRUE) is plotted. FALSE = only computations are carried out without plotting, an object of class 'STB' is returned which can be stored an plotted later on, e.g. to avoid computing an STB every time a Sweave/mWeave report is updated |
legend |
(logical) TRUE a legend is plotted "topleft" |
timer |
(logical) TRUE = the time spent for computing the STB will be printed |
pch |
(integer) plotting symbols for the QQ-plot |
pch.out |
(integer) plotting symbols for outlying points |
seed |
(numeric) value interpreted as integer, setting the random number generator (RNG) to a defined state |
... |
further graphical parameters passed on |
Details
Function takes a numeric vector 'vec' and computes the 100(1-alpha)%-simultaneous tolerance band (STB) for
the (DEFAULT )Null-hypothesis H0: vec~N(my, sigma^2) distributed, which is equal to checking whether the residuals of
the simplest linear model y = mu + e (y~1) are normally distributed, i.e. 'e ~ N(0, sigma^2)'. By specification of argument rand.func
other null-distributions can be specified. One has to specify a function with a single argument 'n', which returns a random sample with 'n'
observations, randomly drawn from the desired null-distribution (see description argument rand.func
below).
Note that all random samples as well as vector vec
will be centered to mean=0 and scaled to sd=1.
One can choose between three methods for constructing the 100(1-alpha)% STB. There are two implementations of the quantile-based algorithm ("C", "R" see 1st reference) and one of the rank-based algorithm (see 2nd reference). Methods "C" and "R" can be run in parallel. The rank-based algorithm does not benefit form parallel processing, at least not in the current implementation. It is still the default and recommended for small to medium sized vectors and 10000 <= N <= 20000 simulations, which should be sufficiently accurate reflect the null-distribution. The "C" and "R" options refer to implementations of the quantile-based algorithm. The "C" implementation benefits most from using multiple cores, i.e. the larger 'Ncpu' the better, and should be used for large problems, i.e. rather large number of elements and large number of simulations.
The table below gives an impression how these algorithms perform. Runtimes were measured under Windows 7 on a Intel Xeon E5-2687W 3.1 GHz workstation with 16 logical cores and 16 GB RAM. The runtime of the C-implementation of the quantile-based algorithm is denoted as "t_qC12" applied parallely with 12 cores. Each setting was repeated 10 times and the overall run time was then divided by 10 providing sufficiently robust simulation results. Note, that for smaller problem sizes a large proportion of the overall runtime is due to simulating, i.e. drawing from the null-distribution.
_____N_obs | _____N_sim | ____t_rank | ____t_qC12 |
25 | 5000 | 0.4s | 0.5s |
25 | 10000 | 0.8s | 1.3s |
50 | 10000 | 1.0s | 3.2s |
100 | 10000 | 1.7s | 2.9s |
100 | 20000 | 3.0s | 4.8s |
225 | 20000 | 5.1s | 8.3s |
300 | 30000 | 9.6s | 17.2s |
300 | 50000 | 16.1s | 24.9s |
1000 | 50000 | 47.8s | 123.5s |
Value
invisibly returns a list-type object of class STB
, which comprises all arguments accepted by this function.
Author(s)
Andre Schuetzenmeister andre.schuetzenmeister@roche.com
References
Schuetzenmeister, A., Jensen, U., Piepho, H.P. (2011), Checking assumptions of normality and homoscedasticity in the general linear model. Communications in Statistics - Simulation and Computation; S. 141-154
Schuetzenmeister, A. and Piepho, H.P. (2012). Residual analysis of linear mixed models using a simulation approach. Computational Statistics and Data Analysis, 56, 1405-1416
See Also
Examples
### log-normal vector to be checked for normality
## Not run:
set.seed(111)
stb(exp(rnorm(30)), col.out="red", legend=TRUE)
### uniformly distributed sample checked for Chi-Squared Distribution with DF=1, degrees of freedom
set.seed(707)
stb(runif(25, -5, 5), rand.func=function(n){rchisq(n=n, df=1)},
col.out="red", legend=TRUE, main="Chi-Squared with DF=1")
### check whether an Chi-Squared (DF=1) random sample better fits
stb(rchisq(25, df=1), rand.func=function(n){rchisq(n=n, df=1)},
col.out="red", legend=TRUE, main="Chi-Squared with DF=1")
### add STB to an existing plot
plot(sort(rnorm(30)), sort(rnorm(30)))
stb(rnorm(30), add=TRUE)
### compute STB for later use and prevent plotting
STB <- stb(rnorm(30), plot=FALSE)
## End(Not run)