bed_jaccard {valr} | R Documentation |
Calculate the Jaccard statistic for two sets of intervals.
Description
Quantifies the extent of overlap between to sets of intervals in terms of base-pairs. Groups that are shared between input are used to calculate the statistic for subsets of data.
Usage
bed_jaccard(x, y)
Arguments
x |
|
y |
Details
The Jaccard statistic takes values of [0,1]
and is measured as:
J(x,y) = \frac{\mid x \bigcap y \mid}
{\mid x \bigcup y \mid} =
\frac{\mid x \bigcap y \mid}
{\mid x \mid + \mid y \mid -
\mid x \bigcap y \mid}
Interval statistics can be used in combination with
dplyr::group_by()
and dplyr::do()
to calculate
statistics for subsets of data. See vignette('interval-stats')
for
examples.
Value
tibble with the following columns:
-
len_i
length of the intersection in base-pairs -
len_u
length of the union in base-pairs -
jaccard
value of jaccard statistic -
n_int
number of intersecting intervals betweenx
andy
If inputs are grouped, the return value will contain one set of values per group.
See Also
https://bedtools.readthedocs.io/en/latest/content/tools/jaccard.html
Other interval statistics:
bed_absdist()
,
bed_fisher()
,
bed_projection()
,
bed_reldist()
Examples
genome <- read_genome(valr_example("hg19.chrom.sizes.gz"))
x <- bed_random(genome, seed = 1010486)
y <- bed_random(genome, seed = 9203911)
bed_jaccard(x, y)
# calculate jaccard per chromosome
bed_jaccard(
dplyr::group_by(x, chrom),
dplyr::group_by(y, chrom)
)