matchDatasets {shiftR} | R Documentation |
Match Two Data Sets by Location
Description
The goal of this function is to match records in the data sets for subsequent enrichment analysis.
For each record in the primary data set (data1
)
it finds the record in the auxiliary data set (data1
)
which overlap with it or lie within the flanking distance (flank
).
If multiple such auxiliary record are found,
we select the one with the center closest to
the center of the primary record.
If no such record is available, no matching is made for the primary record.
Usage
matchDatasets(data1, data2, flank = 0)
Arguments
data1 |
A data frame with the primary data set, must have at least 4 columns:
|
data2 |
A data frame with the auxiliary data set. |
flank |
Allowed distance between matched records. |
Value
Returns a list with matched data sets.
data1 |
The primary data sets without unmatched records. |
data2 |
The auxiliary data set records matching those in
|
Note
For a technical reason, the chromosome positions are assumed to be
no greater than 1e9
.
Author(s)
Andrey A Shabalin andrey.shabalin@gmail.com
Examples
data1 = read.csv(text =
"chr,start,end,stat
chr1,100,200,1
chr1,150,250,2
chr1,200,300,3
chr1,300,400,4
chr1,997,997,5
chr1,998,998,6
chr1,999,999,7")
data2 = read.csv(text =
"chr,start,end,stat
chr1,130,130,1
chr1,140,140,2
chr1,165,165,3
chr1,200,200,4
chr1,240,240,5
chr1,340,340,6
chr1,350,350,7
chr1,360,360,8
chr1,900,900,9")
# Match data sets exactly.
matchDatasets(data1, data2, 0)
# Match data sets with a flank.
# The last records are now matched.
matchDatasets(data1, data2, 100)