R: Estimate Parameter of a Hypergeometric Distribution

ehyper {EnvStats}

R Documentation

Estimate Parameter of a Hypergeometric Distribution

Description

Estimate m, the number of white balls in the urn, or m+n, the total number of balls in the urn, for a hypergeometric distribution.

Usage

  ehyper(x, m = NULL, total = NULL, k, method = "mle")

Arguments

`x`	non-negative integer indicating the number of white balls out of a sample of size `k` drawn without replacement from the urn. Missing (`NA`), undefined (`NaN`), and infinite (`Inf`, `-Inf`) values are not allowed.
`m`	non-negative integer indicating the number of white balls in the urn. You must supply `m` or `total`, but not both. Missing values (`NA`s) are not allowed.
`total`	positive integer indicating the total number of balls in the urn (i.e., `m+n`). You must supply `m` or `total`, but not both. Missing values (`NA`s) are not allowed.
`k`	positive integer indicating the number of balls drawn without replacement from the urn. Missing values (`NA`s) are not allowed.
`method`	character string specifying the method of estimation. Possible values are `"mle"` (maximum likelihood; the default) and `"mvue"` (minimum variance unbiased). The mvue method is only available when you are estimating `m` (i.e., when you supply the argument `total`). See the DETAILS section for more information on these estimation methods.

Details

Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are not allowed.

Let x be an observation from a hypergeometric distribution with parameters m=M, n=N, and k=K. In R nomenclature, x represents the number of white balls drawn out of a sample of K balls drawn without replacement from an urn containing M white balls and N black balls. The total number of balls in the urn is thus M+N. Denote the total number of balls by T = M+N.

Estimation

Estimating M, Given T and K are known
When T and K are known, the maximum likelihood estimator (mle) of M is given by (Forbes et al., 2011):

\hat{M}_{mle} = floor[(T + 1) x / K] \;\;\;\; (1)

where floor() represents the floor function. That is, floor(y) is the largest integer less than or equal to y.

If the quantity floor[(T + 1) x / K] is an integer, then the mle of M is also given by (Johnson et al., 1992, p.263):

\hat{M}_{mle} = [(T + 1) x / K] - 1 \;\;\;\; (2)

which is what the function ehyper uses for this case.

The minimum variance unbiased estimator (mvue) of M is given by (Forbes et al., 2011):

\hat{M}_{mvue} = (T x / K) \;\;\;\; (3)

Estimating T, given M and K are known
When M and K are known, the maximum likelihood estimator (mle) of T is given by (Forbes et al., 2011):

\hat{T}_{mle} = floor(K M / x) \;\;\;\; (4)

Value

a list of class "estimate" containing the estimated parameters and other information. See
estimate.object for details.

Note

The hypergeometric distribution can be described by an urn model with M white balls and N black balls. If K balls are drawn with replacement, then the number of white balls in the sample of size K follows a binomial distribution with parameters size=K and prob=M/(M+N). If K balls are drawn without replacement, then the number of white balls in the sample of size K follows a hypergeometric distribution with parameters m=M, n=N, and k=K.

The name “hypergeometric” comes from the fact that the probabilities associated with this distribution can be written as successive terms in the expansion of a function of a Gaussian hypergeometric series.

The hypergeometric distribution is applied in a variety of fields, including quality control and estimation of animal population size. It is also the distribution used to compute probabilities for Fishers's exact test for a 2x2 contingency table.

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.

Johnson, N. L., S. Kotz, and A. Kemp. (1992). Univariate Discrete Distributions. Second Edition. John Wiley and Sons, New York, Chapter 6.

Examples

  # Generate an observation from a hypergeometric distribution with 
  # parameters m=10, n=30, and k=5, then estimate the parameter m. 
  # Note: the call to set.seed simply allows you to reproduce this example. 
  # Also, the only parameter actually estimated is m; once m is estimated, 
  # n is computed by subtracting the estimated value of m (8 in this example) 
  # from the given of value of m+n (40 in this example).  The parameters 
  # n and k are shown in the output in order to provide information on 
  # all of the parameters associated with the hypergeometric distribution.

  set.seed(250) 
  dat <- rhyper(nn = 1, m = 10, n = 30, k = 5) 
  dat 
  #[1] 1   

  ehyper(dat, total = 40, k = 5) 

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Hypergeometric
  #
  #Estimated Parameter(s):          m =  8
  #                                 n = 32
  #                                 k =  5
  #
  #Estimation Method:               mle for 'm'
  #
  #Data:                            dat
  #
  #Sample Size:                     1

  #----------

  # Use the same data as in the previous example, but estimate m+n instead. 
  # Note: The only parameter estimated is m+n. Once this is estimated, 
  # n is computed by subtracting the given value of m (10 in this case) 
  # from the estimated value of m+n (50 in this example).

  ehyper(dat, m = 10, k = 5)

  #Results of Distribution Parameter Estimation
  #--------------------------------------------
  #
  #Assumed Distribution:            Hypergeometric
  #
  #Estimated Parameter(s):          m = 10
  #                                 n = 40
  #                                 k =  5
  #
  #Estimation Method:               mle for 'm+n'
  #
  #Data:                            dat
  #
  #Sample Size:                     1


  #----------

  # Clean up
  #---------
  rm(dat)

[Package EnvStats version 2.8.1 Index]