genome_join {fuzzyjoin}R Documentation

Join two tables based on overlapping genomic intervals: both a

Description

This is an extension of interval_join specific to genomic intervals. Genomic intervals include both a chromosome ID and an interval: items are only considered matching if the chromosome ID matches and the interval overlaps. Note that there must be three arguments to by, and that they must be in the order c("chromosome", "start", "end").

Usage

genome_join(x, y, by = NULL, mode = "inner", ...)

genome_inner_join(x, y, by = NULL, ...)

genome_left_join(x, y, by = NULL, ...)

genome_right_join(x, y, by = NULL, ...)

genome_full_join(x, y, by = NULL, ...)

genome_semi_join(x, y, by = NULL, ...)

genome_anti_join(x, y, by = NULL, ...)

Arguments

x

A tbl

y

A tbl

by

Names of columns to join on, in order c("chromosome", "start", "end"). A match will be counted only if the chromosomes are equal and the start/end pairs overlap.

mode

One of "inner", "left", "right", "full" "semi", or "anti"

...

Extra arguments passed on to findOverlaps

Details

All the extra arguments to interval_join, which are passed on to findOverlaps, work for genome_join as well. These include maxgap and minoverlap.

Examples


library(dplyr)

x1 <- tibble(id1 = 1:4,
             chromosome = c("chr1", "chr1", "chr2", "chr2"),
             start = c(100, 200, 300, 400),
             end = c(150, 250, 350, 450))

x2 <- tibble(id2 = 1:4,
             chromosome = c("chr1", "chr2", "chr2", "chr1"),
             start = c(140, 210, 400, 300),
             end = c(160, 240, 415, 320))

if (requireNamespace("IRanges", quietly = TRUE)) {
  # note that the the third and fourth items don't join (even though
  # 300-350 and 300-320 overlap) since the chromosomes are different:
  genome_inner_join(x1, x2, by = c("chromosome", "start", "end"))

  # other functions:
  genome_full_join(x1, x2, by = c("chromosome", "start", "end"))
  genome_left_join(x1, x2, by = c("chromosome", "start", "end"))
  genome_right_join(x1, x2, by = c("chromosome", "start", "end"))
  genome_semi_join(x1, x2, by = c("chromosome", "start", "end"))
  genome_anti_join(x1, x2, by = c("chromosome", "start", "end"))
}


[Package fuzzyjoin version 0.1.6 Index]