genome_join {fuzzyjoin} | R Documentation |
Join two tables based on overlapping genomic intervals: both a
Description
This is an extension of interval_join
specific to genomic intervals.
Genomic intervals include both a chromosome ID and an interval: items are only
considered matching if the chromosome ID matches and the interval overlaps.
Note that there must be three arguments to by, and that they must be in the order
c("chromosome", "start", "end").
Usage
genome_join(x, y, by = NULL, mode = "inner", ...)
genome_inner_join(x, y, by = NULL, ...)
genome_left_join(x, y, by = NULL, ...)
genome_right_join(x, y, by = NULL, ...)
genome_full_join(x, y, by = NULL, ...)
genome_semi_join(x, y, by = NULL, ...)
genome_anti_join(x, y, by = NULL, ...)
Arguments
x |
A tbl |
y |
A tbl |
by |
Names of columns to join on, in order c("chromosome", "start", "end"). A match will be counted only if the chromosomes are equal and the start/end pairs overlap. |
mode |
One of "inner", "left", "right", "full" "semi", or "anti" |
... |
Extra arguments passed on to |
Details
All the extra arguments to interval_join
, which are
passed on to findOverlaps
, work for genome_join
as well. These include maxgap
and minoverlap
.
Examples
library(dplyr)
x1 <- tibble(id1 = 1:4,
chromosome = c("chr1", "chr1", "chr2", "chr2"),
start = c(100, 200, 300, 400),
end = c(150, 250, 350, 450))
x2 <- tibble(id2 = 1:4,
chromosome = c("chr1", "chr2", "chr2", "chr1"),
start = c(140, 210, 400, 300),
end = c(160, 240, 415, 320))
if (requireNamespace("IRanges", quietly = TRUE)) {
# note that the the third and fourth items don't join (even though
# 300-350 and 300-320 overlap) since the chromosomes are different:
genome_inner_join(x1, x2, by = c("chromosome", "start", "end"))
# other functions:
genome_full_join(x1, x2, by = c("chromosome", "start", "end"))
genome_left_join(x1, x2, by = c("chromosome", "start", "end"))
genome_right_join(x1, x2, by = c("chromosome", "start", "end"))
genome_semi_join(x1, x2, by = c("chromosome", "start", "end"))
genome_anti_join(x1, x2, by = c("chromosome", "start", "end"))
}