samp.dist {asbio} | R Documentation |
Animated and/or snapshot representations of a statistic's sampling distribution
Description
This help page describes a series of asbio functions for depicting sampling distributions. The function samp.dist
samples from a parent distribution without replacement with sample size = s.size
,
R
times. At each iteration a statistic requested in stat
is calculated. Thus a distribution of R
statistic estimates is created.
The function samp.dist
shows this distribution as an animated anim = TRUE
or non-animated anim = FALSE
density histogram.
Sampling distributions for up to four different statistics utilizing two different parent distributions are possible using samp.dist
.
Sampling distributions can be combined in various ways by specifying a function in func
(see below).
The function samp.dist.n
was designed to show (with animation) how sampling distributions vary with sample size, and is still under development.
The function samp.dist.snap
creates snapshots, i.e. simultaneous views of a sampling distribution at particular sample sizes.
The function dirty.dist
can be used to create contaminated parent distributions.
Usage
samp.dist(parent = NULL, parent2 = NULL, biv.parent = NULL, s.size = 1, s.size2
= NULL, R = 1000, nbreaks = 50, stat = mean, stat2 = NULL, stat3 = NULL, stat4
= NULL, xlab = expression(bar(x)), func = NULL, show.n = TRUE, show.SE = FALSE,
anim = TRUE, interval = 0.01, col.anim = "rainbow", digits = 3, ...)
samp.dist.snap(parent = NULL, parent2 = NULL, biv.parent = NULL, stat = mean,
stat2 = NULL, stat3 = NULL, stat4 = NULL, s.size = c(1, 3, 6, 10, 20, 50),
s.size2 = NULL, R = 1000, func = NULL, xlab = expression(bar(x)),
show.SE = TRUE, fits = NULL, show.fits = TRUE, xlim = NULL, ylim = NULL, ...)
samp.dist.method.tck()
samp.dist.tck(statc = "mean")
samp.dist.snap.tck1(statc = "mean")
samp.dist.snap.tck2(statc = "mean")
dirty.dist(s.size, parent = expression(rnorm(1)),
cont = expression(rnorm(1, mean = 10)), prop.cont = 0.1)
samp.dist.n(parent, R = 500, n.seq = seq(1, 30), stat = mean, xlab = expression(bar(x)),
nbreaks = 50, func = NULL, show.n = TRUE,
show.SE = FALSE, est.density = TRUE, col.density = 4, lwd.density = 2,
est.ylim = TRUE, ylim = NULL, anim = TRUE, interval = 0.5,
col.anim = NULL, digits = 3, ...)
Arguments
parent |
A vector or vector generating function, describing the parental distribution.
Any collection of values can be used. When using random value generators for
parental distributions, for CPU efficiency (and accuracy) one should use
|
parent2 |
An optional second parental distribution (see |
biv.parent |
A bivariate (two column) distribution. |
s.size |
An integer defining sample size (or a vector of integers in the case of |
s.size2 |
An optional integer defining a second sample size if a second statistic is to be calculated. Again, this will be a vector of integers in the of |
R |
The number of samples to be taken from parent distribution(s). |
nbreaks |
Number of breaks in the histogram. |
stat |
The statistic whose sampling distribution is to be represented. Will work for any summary statistic that only requires a call to data; e.g. |
stat2 |
An optional second statistic. Useful for conceptualizing sampling distributions of test statistics. Calculated from sampling |
stat3 |
An optional third statistic. The sampling distribution is created from the same sample data used for |
stat4 |
An optional fourth statistic. The sampling distribution is created from the same sample data used for |
xlab |
X-axis label. |
func |
An optional function used to manipulate a sampling distribution or to combine the sampling distributions of two or more statistics.
The function must contain the following arguments (although they needn't all be used in the function):
|
show.n |
A logical command, |
show.SE |
A logical command, |
anim |
A logical command indicating whether or not animation should be used. |
interval |
Animation speed. Decreasing |
col.anim |
Color to be used in animation. Three changing color palettes: |
digits |
The number of digits to be displayed in the bootstrap standard error. |
fits |
Fitted distributions for |
show.fits |
Logical indicating whether or not fits should be shown (fits
will not be shown if no fitting function is specified regardless of whether this is |
xlim |
A two element numeric vector defining the upper and lower limits of the X-axis. |
ylim |
A two element numeric vector defining the upper and lower limits of the Y-axis. |
statc |
Presets for certain statistics. Currently one of |
cont |
A distribution representing a source of contamination in the parent population. Used by function |
prop.cont |
The proportion of the parent distribution that is contaminated by |
n.seq |
A range of sample sizes for |
est.density |
A logical command for |
col.density |
The color of the density line for |
lwd.density |
The width of the density line for |
est.ylim |
Logical. If |
... |
Additional arguments from |
Details
Sampling distributions of individual statistics can be created with samp.dist
, or the function can be used in more sophisticated ways, e.g.
to create sampling distributions of ratios of statistics, i.e. t*, F* etc. (see examples below). To provide pedagogical clarity animation for figures is provided.
To calculate bivariate statistics, specify the parent distribution with biv.parent
and the statistic with func
(see below).
Two general uses of the function samp.dist
are possible.
1) One can demonstrate the accumulation of statistics for a single sample size using animation.
This is useful because as more and more statistics are acquired the frequentist paradigm associated with sampling distributions becomes better represented (i.e the number of estimates is closer to infinity). This is elucidated by allowing the default fix.n = TRUE
. Animation will be provided with the default anim = TRUE
. Up two parent distributions, up to two sample sizes, and up to four distinct statistics (i.e. four distinct sampling distributions, representing four distinct estimators) can be used. The arguments stat
and stat3
will be drawn from parent
, while stat3
and stat4
will be drawn from parent2
. These distributions can be manipulated and combined in an infinite number of ways with an auxiliary function called in the argument func
(see examples below). This allows depiction of sampling distributions made up of multiple estimators, e.g. test statistics.
2) One can provide simultaneous snapshots of a sampling distribution at a particular sample size with the function samp.dist.snap
.
Loading the package tcltk allows use of the functions samp.dist.tck
, samp.dist.method.tck
, samp.dist.snap.tck1
and samp.dist.snap.tck2
,
which provide interactive GUIs that run samp.dist
.
Value
Returns a representation of a statistic's sampling distribution in the form of a histogram.
Author(s)
Ken Aho
Examples
## Not run:
##Central limit theorem
#Snapshots of four sample sizes.
samp.dist.snap(parent=expression(rexp(s.size)), s.size = c(1,5,10,50), R = 1000)
##sample mean animation
samp.dist(parent=expression(rexp(s.size)), col.anim="heat.colors", interval=.3)
##Distribution of t-statistics from a pooled variance t-test under valid and invalid assumptions
#valid
t.star<-function(s.dist1, s.dist2, s.dist3, s.dist4, s.size = 6, s.size2 =
s.size2){
MSE<-(((s.size - 1) * s.dist3) + ((s.size2 - 1) * s.dist4))/(s.size + s.size2-2)
func.res <- (s.dist1 - s.dist2)/(sqrt(MSE) * sqrt((1/s.size) + (1/s.size2)))
func.res}
samp.dist(parent = expression(rnorm(s.size)), parent2 =
expression(rnorm(s.size2)), s.size=6, s.size2 = 6, R=1000, stat = mean,
stat2 = mean, stat3 = var, stat4 = var, xlab = "t*", func = t.star)
curve(dt(x, 10), from = -6, to = 6, add = TRUE, lwd = 2)
legend("topleft", lwd = 2, col = 1, legend = "t(10)")
#invalid; same population means (null true) but different variances and other distributional
#characteristics.
samp.dist(parent = expression(runif(s.size, min = 0, max = 2)), parent2 =
expression(rexp(s.size2)), s.size=6, s.size2 = 6, R = 1000, stat = mean,
stat2 = mean, stat3 = var, stat4 = var, xlab = "t*", func = t.star)
curve(dt(x, 10),from = -6, to = 6,add = TRUE, lwd = 2)
legend("topleft", lwd = 2, col = 1, legend = "t(10)")
## Pearson's R
require(mvtnorm)
BVN <- function(s.size) rmvnorm(s.size, c(0, 0), sigma = matrix(ncol = 2,
nrow = 2, data = c(1, 0, 0, 1)))
samp.dist(biv.parent = expression(BVN(s.size)), s.size = 20, func = cor, xlab = "r")
#Interactive GUI, require package 'tcltk'
samp.dist.tck("S^2")
samp.dist.snap.tck1("Huber estimator")
samp.dist.snap.tck2("F*")
## End(Not run)