R: Retrieving annotated sequences

gff2fasta {microseq}

R Documentation

Retrieving annotated sequences

Description

Retrieving from a genome the sequences specified in a gff.table.

Usage

gff2fasta(gff.table, genome)

Arguments

`gff.table`	A `gff.table` (`tibble`) with genomic features information.
`genome`	A fasta object (`tibble`) with the genome sequence(s).

Details

Each row in gff.table (see readGFF) describes a genomic feature in the genome, which is a tibble with columns ‘⁠Header⁠’ and ‘⁠Sequence⁠’. The information in the columns Seqid, Start, End and Strand are used to retrieve the sequences from the ‘⁠Sequence⁠’ column of genome. Every Seqid in the gff.table must match the first token in one of the ‘⁠Header⁠’ texts, in order to retrieve from the correct ‘⁠Sequence⁠’.

Value

A fasta object with one row for each row in gff.table. The Header for each sequence is a summary of the information in the corresponding row of gff.table.

Author(s)

Lars Snipen and Kristian Hovde Liland.

Examples

# Using two files in this package
gff.file <- file.path(path.package("microseq"),"extdata","small.gff")
genome.file <- file.path(path.package("microseq"),"extdata","small.fna")

# Reading the genome first
genome <- readFasta(genome.file)

# Retrieving sequences
gff.table <- readGFF(gff.file)
fa.tbl <- gff2fasta(gff.table, genome)

# Alternative, using piping
readGFF(gff.file) %>% gff2fasta(genome) -> fa.tbl

[Package microseq version 2.1.6 Index]