samp.dist {asbio}  R Documentation 
Animated and/or snapshot representations of a statistic's sampling distribution
Description
This help page describes a series of asbio functions for depicting sampling distributions. The function samp.dist
samples from a parent distribution without replacement with sample size = s.size
,
R
times. At each iteration a statistic requested in stat
is calculated. Thus a distribution of R
statistic estimates is created.
The function samp.dist
shows this distribution as an animated anim = TRUE
or nonanimated anim = FALSE
density histogram.
Sampling distributions for up to four different statistics utilizing two different parent distributions are possible using samp.dist
.
Sampling distributions can be combined in various ways by specifying a function in func
(see below).
The function samp.dist.n
was designed to show (with animation) how sampling distributions vary with sample size, and is still under development.
The function samp.dist.snap
creates snapshots, i.e. simultaneous views of a sampling distribution at particular sample sizes.
The function dirty.dist
can be used to create contaminated parent distributions.
Usage
samp.dist(parent = NULL, parent2 = NULL, biv.parent = NULL, s.size = 1, s.size2
= NULL, R = 1000, nbreaks = 50, stat = mean, stat2 = NULL, stat3 = NULL, stat4
= NULL, xlab = expression(bar(x)), func = NULL, show.n = TRUE, show.SE = FALSE,
anim = TRUE, interval = 0.01, col.anim = "rainbow", digits = 3, ...)
samp.dist.snap(parent = NULL, parent2 = NULL, biv.parent = NULL, stat = mean,
stat2 = NULL, stat3 = NULL, stat4 = NULL, s.size = c(1, 3, 6, 10, 20, 50),
s.size2 = NULL, R = 1000, func = NULL, xlab = expression(bar(x)),
show.SE = TRUE, fits = NULL, show.fits = TRUE, xlim = NULL, ylim = NULL, ...)
samp.dist.method.tck()
samp.dist.tck(statc = "mean")
samp.dist.snap.tck1(statc = "mean")
samp.dist.snap.tck2(statc = "mean")
dirty.dist(s.size, parent = expression(rnorm(1)),
cont = expression(rnorm(1, mean = 10)), prop.cont = 0.1)
samp.dist.n(parent, R = 500, n.seq = seq(1, 30), stat = mean, xlab = expression(bar(x)),
nbreaks = 50, func = NULL, show.n = TRUE,
show.SE = FALSE, est.density = TRUE, col.density = 4, lwd.density = 2,
est.ylim = TRUE, ylim = NULL, anim = TRUE, interval = 0.5,
col.anim = NULL, digits = 3, ...)
Arguments
parent 
A vector or vector generating function, describing the parental distribution.
Any collection of values can be used. When using random value generators for
parental distributions, for CPU efficiency (and accuracy) one should use

parent2 
An optional second parental distribution (see 
biv.parent 
A bivariate (two column) distribution. 
s.size 
An integer defining sample size (or a vector of integers in the case of 
s.size2 
An optional integer defining a second sample size if a second statistic is to be calculated. Again, this will be a vector of integers in the of 
R 
The number of samples to be taken from parent distribution(s). 
nbreaks 
Number of breaks in the histogram. 
stat 
The statistic whose sampling distribution is to be represented. Will work for any summary statistic that only requires a call to data; e.g. 
stat2 
An optional second statistic. Useful for conceptualizing sampling distributions of test statistics. Calculated from sampling 
stat3 
An optional third statistic. The sampling distribution is created from the same sample data used for 
stat4 
An optional fourth statistic. The sampling distribution is created from the same sample data used for 
xlab 
Xaxis label. 
func 
An optional function used to manipulate a sampling distribution or to combine the sampling distributions of two or more statistics.
The function must contain the following arguments (although they needn't all be used in the function):

show.n 
A logical command, 
show.SE 
A logical command, 
anim 
A logical command indicating whether or not animation should be used. 
interval 
Animation speed. Decreasing 
col.anim 
Color to be used in animation. Three changing color palettes: 
digits 
The number of digits to be displayed in the bootstrap standard error. 
fits 
Fitted distributions for 
show.fits 
Logical indicating whether or not fits should be shown (fits
will not be shown if no fitting function is specified regardless of whether this is 
xlim 
A two element numeric vector defining the upper and lower limits of the Xaxis. 
ylim 
A two element numeric vector defining the upper and lower limits of the Yaxis. 
statc 
Presets for certain statistics. Currently one of 
cont 
A distribution representing a source of contamination in the parent population. Used by function 
prop.cont 
The proportion of the parent distribution that is contaminated by 
n.seq 
A range of sample sizes for 
est.density 
A logical command for 
col.density 
The color of the density line for 
lwd.density 
The width of the density line for 
est.ylim 
Logical. If 
... 
Additional arguments from 
Details
Sampling distributions of individual statistics can be created with samp.dist
, or the function can be used in more sophisticated ways, e.g.
to create sampling distributions of ratios of statistics, i.e. t*, F* etc. (see examples below). To provide pedagogical clarity animation for figures is provided.
To calculate bivariate statistics, specify the parent distribution with biv.parent
and the statistic with func
(see below).
Two general uses of the function samp.dist
are possible.
1) One can demonstrate the accumulation of statistics for a single sample size using animation.
This is useful because as more and more statistics are acquired the frequentist paradigm associated with sampling distributions becomes better represented (i.e the number of estimates is closer to infinity). This is elucidated by allowing the default fix.n = TRUE
. Animation will be provided with the default anim = TRUE
. Up two parent distributions, up to two sample sizes, and up to four distinct statistics (i.e. four distinct sampling distributions, representing four distinct estimators) can be used. The arguments stat
and stat3
will be drawn from parent
, while stat3
and stat4
will be drawn from parent2
. These distributions can be manipulated and combined in an infinite number of ways with an auxiliary function called in the argument func
(see examples below). This allows depiction of sampling distributions made up of multiple estimators, e.g. test statistics.
2) One can provide simultaneous snapshots of a sampling distribution at a particular sample size with the function samp.dist.snap
.
Loading the package tcltk allows use of the functions samp.dist.tck
, samp.dist.method.tck
, samp.dist.snap.tck1
and samp.dist.snap.tck2
,
which provide interactive GUIs that run samp.dist
.
Value
Returns a representation of a statistic's sampling distribution in the form of a histogram.
Author(s)
Ken Aho
Examples
## Not run:
##Central limit theorem
#Snapshots of four sample sizes.
samp.dist.snap(parent=expression(rexp(s.size)), s.size = c(1,5,10,50), R = 1000)
##sample mean animation
samp.dist(parent=expression(rexp(s.size)), col.anim="heat.colors", interval=.3)
##Distribution of tstatistics from a pooled variance ttest under valid and invalid assumptions
#valid
t.star<function(s.dist1, s.dist2, s.dist3, s.dist4, s.size = 6, s.size2 =
s.size2){
MSE<(((s.size  1) * s.dist3) + ((s.size2  1) * s.dist4))/(s.size + s.size22)
func.res < (s.dist1  s.dist2)/(sqrt(MSE) * sqrt((1/s.size) + (1/s.size2)))
func.res}
samp.dist(parent = expression(rnorm(s.size)), parent2 =
expression(rnorm(s.size2)), s.size=6, s.size2 = 6, R=1000, stat = mean,
stat2 = mean, stat3 = var, stat4 = var, xlab = "t*", func = t.star)
curve(dt(x, 10), from = 6, to = 6, add = TRUE, lwd = 2)
legend("topleft", lwd = 2, col = 1, legend = "t(10)")
#invalid; same population means (null true) but different variances and other distributional
#characteristics.
samp.dist(parent = expression(runif(s.size, min = 0, max = 2)), parent2 =
expression(rexp(s.size2)), s.size=6, s.size2 = 6, R = 1000, stat = mean,
stat2 = mean, stat3 = var, stat4 = var, xlab = "t*", func = t.star)
curve(dt(x, 10),from = 6, to = 6,add = TRUE, lwd = 2)
legend("topleft", lwd = 2, col = 1, legend = "t(10)")
## Pearson's R
require(mvtnorm)
BVN < function(s.size) rmvnorm(s.size, c(0, 0), sigma = matrix(ncol = 2,
nrow = 2, data = c(1, 0, 0, 1)))
samp.dist(biv.parent = expression(BVN(s.size)), s.size = 20, func = cor, xlab = "r")
#Interactive GUI, require package 'tcltk'
samp.dist.tck("S^2")
samp.dist.snap.tck1("Huber estimator")
samp.dist.snap.tck2("F*")
## End(Not run)