Hyperintersection {hint} | R Documentation |
The Hypergeometric Intersection Family of Distributions
Description
The Hypergeometric Intersection Family of Distributions
Usage
dhint(n, A, q = 0, range = NULL, approx = FALSE, log = FALSE, verbose = TRUE)
phint(n, A, q = 0, vals, upper.tail = TRUE, log.p = FALSE)
qhint(p, n, A, q = 0, upper.tail = TRUE, log.p = FALSE)
rhint(num = 5, n, A, q = 0)
Arguments
n |
An integer specifying the number of categories in the urns. |
A |
A vector of integers specifying the numbers of balls drawn from each urn. The length of the vector equals the number of urns. |
q |
An integer specifying the number of categories in the second urn which have duplicate members. If q is 0 (default) then the symmetrical, singleton case is computed, otherwise the asymmetrical, duplicates case is computed (see Details). |
range |
A vector of integers specifying the intersection sizes for which probabilities (dhint) or cumulative probabilites (phint) should be computed (can be a single number). If range is NULL (default) then probabilities will be returned over the entire range of possible values. |
approx |
Logical. If TRUE, a binomial approximation will be used to generate the distribution. |
log |
Logical. If TRUE, probabilities p are given as log(p). Defaults to FALSE. |
verbose |
Logical. If TRUE, progress of calculation in the asymmetric, duplicates case is printed to the screen. |
vals |
A vector of integers specifying the intersection sizes for which probabilities (dhint) or cumulative probabilites (phint) should be computed (can be a single number). If range is NULL (default) then probabilities will be returned over the entire range of possible values. |
upper.tail |
Logical. If TRUE, probabilities are P(X >= c), else P(X <= c). Defaults to TRUE. |
log.p |
Logical. If TRUE, probabilities p are given as log(p). Defaults to FALSE. |
p |
A probability between 0 and 1. |
num |
An integer specifying the number of random numbers to generate. Defaults to 5. |
Details
The hypergeometric intersection distributions describe the distribution of intersection sizes when sampling without replacement from two separate urns in which reside balls belonging to the same n object categories. In the simplest case when there is exactly one ball in each category in each urn (symmetrical, singleton case), then the distribution is hypergeometric:
P(X=v)=\frac{{a \choose v}{n-a \choose b-v}}{{n \choose b}}
When there are three urns, the distribution is given by
P(X=v) = \frac{ {a \choose v} \sum_{i} {a-v \choose i} {n-a \choose b-v-i} {n-v-i \choose c-v} }{ {n \choose b} {n \choose c} }
If, however, we allow duplicates in q \leq n
of the categories in the second urn, then the distribution of intersection sizes is described by the following variant of the hypergeometric:
P(X=v) = \sum_{m=0}^{\alpha} \sum_{l=0}^{\beta} \sum_{j=0}^{l} {n-q \choose v-l} {q \choose l} {q-l \choose m} {n-v-q+l \choose a-v-m} {l \choose j} {n+q-a-m-j \choose b-v} / {n \choose a}{n+q \choose b}
Value
'dhint', 'phint', and 'qhint' return a data frame with two columns: v, the intersection size, and p, the associated p-values. 'rhint' returns an integer vector of random samples based on the hypergeometric intersection distribution.
References
Kalinka, A. T. (2013). The probability of drawing intersections: extending the hypergeometric distribution. arXiv.1305.0717
Examples
## Generate the distribution of intersections sizes without duplicates:
dd <- dhint(20, c(10, 12))
## Restrict the range of intersections.
dd <- dhint(20, c(10, 12), range = 0:5)
## Allow duplicates in q of the categories in the second urn:
dd <- dhint(35, c(15, 11), 22, verbose = FALSE)
## Generate cumulative probabilities.
pp <- phint(29, c(15, 8), vals = 5)
pp <- phint(29, c(15, 8), vals = 2, upper.tail = FALSE)
pp <- phint(29, c(15, 8), 23, vals = 2)
## Extract quantiles:
qq <- qhint(0.15, 23, c(12, 10))
qq <- qhint(0.15, 23, c(12, 10), 18)
## Generate random samples from Hypergeometric intersection distributions.
rr <- rhint(num = 10, 18, c(9, 14))
rr <- rhint(num = 10, 22, c(11, 17), 12)