ehyper {EnvStats} | R Documentation |
Estimate Parameter of a Hypergeometric Distribution
Description
Estimate m
, the number of white balls in the urn, or
m+n
, the total number of balls in the urn, for a
hypergeometric distribution.
Usage
ehyper(x, m = NULL, total = NULL, k, method = "mle")
Arguments
x |
non-negative integer indicating the number of white balls out of a sample of
size |
m |
non-negative integer indicating the number of white balls in the urn.
You must supply |
total |
positive integer indicating the total number of balls in the urn (i.e.,
|
k |
positive integer indicating the number of balls drawn without replacement from the
urn. Missing values ( |
method |
character string specifying the method of estimation. Possible values are
|
Details
Missing (NA
), undefined (NaN
), and infinite (Inf
, -Inf
)
values are not allowed.
Let x
be an observation from a
hypergeometric distribution with
parameters m=
M
, n=
N
, and k=
K
.
In R nomenclature, x
represents the number of white balls drawn out of a
sample of K
balls drawn without replacement from an urn containing
M
white balls and N
black balls. The total number of balls in the
urn is thus M+N
. Denote the total number of balls by T = M+N
.
Estimation
Estimating M, Given T and K are known
When T
and K
are known, the maximum likelihood estimator (mle) of
M
is given by (Forbes et al., 2011):
\hat{M}_{mle} = floor[(T + 1) x / K] \;\;\;\; (1)
where floor()
represents the floor
function.
That is, floor(y)
is the largest integer less than or equal to y
.
If the quantity floor[(T + 1) x / K]
is an integer, then the mle of
M
is also given by (Johnson et al., 1992, p.263):
\hat{M}_{mle} = [(T + 1) x / K] - 1 \;\;\;\; (2)
which is what the function ehyper
uses for this case.
The minimum variance unbiased estimator (mvue) of M
is given by
(Forbes et al., 2011):
\hat{M}_{mvue} = (T x / K) \;\;\;\; (3)
Estimating T, given M and K are known
When M
and K
are known, the maximum likelihood estimator (mle) of
T
is given by (Forbes et al., 2011):
\hat{T}_{mle} = floor(K M / x) \;\;\;\; (4)
Value
a list of class "estimate"
containing the estimated parameters and other information.
See
estimate.object
for details.
Note
The hypergeometric distribution can be described by
an urn model with M
white balls and N
black balls. If K
balls
are drawn with replacement, then the number of white balls in the sample
of size K
follows a binomial distribution with
parameters size=
K
and prob=
M/(M+N)
. If K
balls are
drawn without replacement, then the number of white balls in the sample of
size K
follows a hypergeometric distribution
with parameters m=
M
, n=
N
, and k=
K
.
The name “hypergeometric” comes from the fact that the probabilities associated with this distribution can be written as successive terms in the expansion of a function of a Gaussian hypergeometric series.
The hypergeometric distribution is applied in a variety of fields, including quality control and estimation of animal population size. It is also the distribution used to compute probabilities for Fishers's exact test for a 2x2 contingency table.
Author(s)
Steven P. Millard (EnvStats@ProbStatInfo.com)
References
Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.
Johnson, N. L., S. Kotz, and A. Kemp. (1992). Univariate Discrete Distributions. Second Edition. John Wiley and Sons, New York, Chapter 6.
See Also
Examples
# Generate an observation from a hypergeometric distribution with
# parameters m=10, n=30, and k=5, then estimate the parameter m.
# Note: the call to set.seed simply allows you to reproduce this example.
# Also, the only parameter actually estimated is m; once m is estimated,
# n is computed by subtracting the estimated value of m (8 in this example)
# from the given of value of m+n (40 in this example). The parameters
# n and k are shown in the output in order to provide information on
# all of the parameters associated with the hypergeometric distribution.
set.seed(250)
dat <- rhyper(nn = 1, m = 10, n = 30, k = 5)
dat
#[1] 1
ehyper(dat, total = 40, k = 5)
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Hypergeometric
#
#Estimated Parameter(s): m = 8
# n = 32
# k = 5
#
#Estimation Method: mle for 'm'
#
#Data: dat
#
#Sample Size: 1
#----------
# Use the same data as in the previous example, but estimate m+n instead.
# Note: The only parameter estimated is m+n. Once this is estimated,
# n is computed by subtracting the given value of m (10 in this case)
# from the estimated value of m+n (50 in this example).
ehyper(dat, m = 10, k = 5)
#Results of Distribution Parameter Estimation
#--------------------------------------------
#
#Assumed Distribution: Hypergeometric
#
#Estimated Parameter(s): m = 10
# n = 40
# k = 5
#
#Estimation Method: mle for 'm+n'
#
#Data: dat
#
#Sample Size: 1
#----------
# Clean up
#---------
rm(dat)