R: Read and write FASTA files

readFasta {microseq}

R Documentation

Read and write FASTA files

Description

Reads and writes biological sequences (DNA, RNA, protein) in the FASTA format.

Usage

readFasta(in.file)
writeFasta(fdta, out.file, width = 0)

Arguments

`in.file`	url/directory/name of (gzipped) FASTA file to read.
`fdta`	A `tibble` with sequence data, see ‘Details’ below.
`out.file`	Name of (gzipped) FASTA file to create.
`width`	Number of characters per line, or 0 for no linebreaks.

Details

These functions handle input/output of sequences in the commonly used FASTA format. For every sequence it is presumed there is one Header-line starting with a ‘>’. If filenames (in.file or out.file) have the extension .gz they will automatically be compressed/uncompressed.

The sequences are stored in a tibble, opening up all the possibilities in R for fast and easy manipulations. The content of the file is stored as two columns, ‘⁠Header⁠’ and ‘⁠Sequence⁠’. If other columns are added, these will be ignored by writeFasta.

The default width = 0 in writeFasta results in no linebreaks in the sequences (one sequence per line).

Value

readFasta returns a tibble with the contents of the (gzipped) FASTA file stored in two columns of text. The first, named ‘⁠Header⁠’, contains the headerlines and the second, named ‘⁠Sequence⁠’, contains the sequences.

writeFasta produces a (gzipped) FASTA file.

Author(s)

Lars Snipen and Kristian Hovde Liland.

Examples

## Not run: 
# We need a FASTA-file to read, here is one example file:
fa.file <- file.path(file.path(path.package("microseq"),"extdata"),"small.ffn")

# Read and write
fdta <- readFasta(fa.file)
ok <- writeFasta(fdta[4:5,], out.file = "delete_me.fasta")

# Make use of dplyr to copy parts of the file to another file
readFasta(fa.file) %>% 
  filter(str_detect(Sequence, "TGA$")) %>% 
  writeFasta(out.file = "TGAstop.fasta", width = 80) -> ok

## End(Not run)

[Package microseq version 2.1.6 Index]