R: Query fasta sequence

get.fasta {bedr}

R Documentation

Query fasta sequence

Description

Query fasta sequence using bedtools get.fasta

Usage

get.fasta(
	x,
	fasta = NULL,
	bed12 = FALSE,
	strand = FALSE,
	output.fasta = FALSE,
	use.name.field = FALSE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)

Arguments

`x`	region or index
`fasta`	a fasta file defaults to mini example hg19 human
`bed12`	should bed12 format be used
`strand`	strand specific i.e. reverse complement negative
`output.fasta`	output a fasta defaults to a data.frame for easier parsing
`use.name.field`	should the name field be used for
`check.zero.based`	check for zero based region
`check.chr`	check for "chr" prefix
`check.valid`	check for valid regions i.e. start < end
`check.sort`	check if region is sorted
`check.merge`	check if region is merged
`verbose`	print progress

Details

Uses bedtools getFasta to query a fasta file and load into R as a data.frame for easy parsing.

Note that the hg19 reference genome fasta is large and requires on the order of 4 GB RAM to avoid a segfault happens.

Value

A data.frame or fasta. The data.frame has is two columns corresponding to the region and the sequence.

Author(s)

Daryl Waggott, Syed Haider

References

http://bedtools.readthedocs.org/en/latest/content/tools/getfasta.html

Examples



if (check.binary("bedtools")) {

## Not run: 

  # get the sequence for a set of regions as a data.frame
  index <- get.example.regions();
  a <- index[[1]];
  b <- get.fasta(bedr.sort.region(a));

  # this time output a fasta
  d <- get.fasta(b, output.fasta = TRUE);
  writeLines(d[[1]], con = "test.fasta");

  # get the region adjacent to a set of mutations in a vcf
  clinvar.vcf.example <- system.file(
    "extdata/clinvar_dbSNP138_example.vcf.gz", 
    package = "bedr"
    );
  clinvar <- read.vcf(clinvar.vcf.example);

  # note that clinvar uses ncbi fasta which does not use "chr" and codes chrM as MT
  clinvar.bed <- data.frame(
    chr = paste0("chr", gsub("MT", "M", clinvar$vcf$CHROM)),
    start = clinvar$vcf$POS - 2,
    end = clinvar$vcf$POS + 1,
    stringsAsFactors = FALSE
    );

  # get trinucleotide sequences of variants on chr M only
  mutation.triplet <- get.fasta(
  	clinvar.bed[which(clinvar.bed$chr == "chrM"), ], 
  	fasta = system.file("extdata/ucsc.hg19.chrM.fasta", package = "bedr"),
  	check.chr = FALSE
  	);
  
## End(Not run)
}

[Package bedr version 1.0.7 Index]