set.genomic.region {Ravages} | R Documentation |
Variants annotation based on gene positions
Description
Attributes regions to variants based on given region positions
Usage
set.genomic.region(x, regions = genes.b37, flank.width = 0L, split = TRUE)
Arguments
x |
A bed.matrix |
regions |
A dataframe in bed format (start is 0-based and end is 1-based) containing the fields : |
flank.width |
An integer: width of the flanking regions in base pairs downstream and upstream the regions. |
split |
Whether to split variants attributed to multiple regions by duplicating this variants, set at TRUE by default |
Details
Warnings: regions$Name
should be a factor containing UNIQUE names of the regions, ORDERED in the genome order.
We provide two data sets of autosomal humain genes, genes.b37
and genes.b38
.
If x@snps$chr
is not a vector of integers, it should be a factor with same levels as regions$Chr
.
If flank.width
is null, only the variants having their position between the regions$Start
and the regions$End
of a gene will be attributed to the corresponding gene.
When two regions overlap, variants in the overlapping zone will be assigned to those two regions, separated by a comma.
If flank.width
is a positive number, variants flank.width
downstream or upstream a gene will be annotated annotated to this gene. You can use flank.width = Inf
to have each variant attributed to the nearest gene.
If a variant is attributed to multiple genomic regions, it will be duplicated in the bed matrix with one row per genomic region if split = TRUE
. Variants will have new IDs being CHR:POS:A1:A2:genomic.region.
Value
The same bed matrix as x with an additional column x@snps$genomic.region
containing the annotation of each variant.
See Also
Examples
#Import 1000Genome data from region around LCT gene
x <- as.bed.matrix(LCT.gen, LCT.fam, LCT.bim)
#Group variants within known genes
x <- set.genomic.region(x)
#Group variants within know genes +/- 500bp
x <- set.genomic.region(x, flank.width=500)