Hyperintersection {hint} | R Documentation |
The Hypergeometric Intersection Family of Distributions
Description
The Hypergeometric Intersection Family of Distributions
Usage
dhint(n, A, q = 0, range = NULL, approx = FALSE, log = FALSE, verbose = TRUE)
phint(n, A, q = 0, vals, upper.tail = TRUE, log.p = FALSE)
qhint(p, n, A, q = 0, upper.tail = TRUE, log.p = FALSE)
rhint(num = 5, n, A, q = 0)
Arguments
n |
An integer specifying the number of categories in the urns. |
A |
A vector of integers specifying the numbers of balls drawn from each urn. The length of the vector equals the number of urns. |
q |
An integer specifying the number of categories in the second urn which have duplicate members. If q is 0 (default) then the symmetrical, singleton case is computed, otherwise the asymmetrical, duplicates case is computed (see Details). |
range |
A vector of integers specifying the intersection sizes for which probabilities (dhint) or cumulative probabilites (phint) should be computed (can be a single number). If range is NULL (default) then probabilities will be returned over the entire range of possible values. |
approx |
Logical. If TRUE, a binomial approximation will be used to generate the distribution. |
log |
Logical. If TRUE, probabilities p are given as log(p). Defaults to FALSE. |
verbose |
Logical. If TRUE, progress of calculation in the asymmetric, duplicates case is printed to the screen. |
vals |
A vector of integers specifying the intersection sizes for which probabilities (dhint) or cumulative probabilites (phint) should be computed (can be a single number). If range is NULL (default) then probabilities will be returned over the entire range of possible values. |
upper.tail |
Logical. If TRUE, probabilities are P(X >= c), else P(X <= c). Defaults to TRUE. |
log.p |
Logical. If TRUE, probabilities p are given as log(p). Defaults to FALSE. |
p |
A probability between 0 and 1. |
num |
An integer specifying the number of random numbers to generate. Defaults to 5. |
Details
The hypergeometric intersection distributions describe the distribution of intersection sizes when sampling without replacement from two separate urns in which reside balls belonging to the same n object categories. In the simplest case when there is exactly one ball in each category in each urn (symmetrical, singleton case), then the distribution is hypergeometric:
When there are three urns, the distribution is given by
If, however, we allow duplicates in of the categories in the second urn, then the distribution of intersection sizes is described by the following variant of the hypergeometric:
Value
'dhint', 'phint', and 'qhint' return a data frame with two columns: v, the intersection size, and p, the associated p-values. 'rhint' returns an integer vector of random samples based on the hypergeometric intersection distribution.
References
Kalinka, A. T. (2013). The probability of drawing intersections: extending the hypergeometric distribution. arXiv.1305.0717
Examples
## Generate the distribution of intersections sizes without duplicates:
dd <- dhint(20, c(10, 12))
## Restrict the range of intersections.
dd <- dhint(20, c(10, 12), range = 0:5)
## Allow duplicates in q of the categories in the second urn:
dd <- dhint(35, c(15, 11), 22, verbose = FALSE)
## Generate cumulative probabilities.
pp <- phint(29, c(15, 8), vals = 5)
pp <- phint(29, c(15, 8), vals = 2, upper.tail = FALSE)
pp <- phint(29, c(15, 8), 23, vals = 2)
## Extract quantiles:
qq <- qhint(0.15, 23, c(12, 10))
qq <- qhint(0.15, 23, c(12, 10), 18)
## Generate random samples from Hypergeometric intersection distributions.
rr <- rhint(num = 10, 18, c(9, 14))
rr <- rhint(num = 10, 22, c(11, 17), 12)