readFasta2 {wrProteo}R Documentation

Read file of protein sequences in fasta format

Description

Read fasta formatted file (from UniProt) to extract (protein) sequences and name. If tableOut=TRUE output may be organized as matrix for separating meta-annotation (eg uniqueIdentifier, entryName, proteinName, GN) in separate columns.

Usage

readFasta2(
  filename,
  delim = "|",
  databaseSign = c("sp", "tr", "generic", "gi"),
  removeEntries = NULL,
  tableOut = FALSE,
  UniprSep = c("OS=", "OX=", "GN=", "PE=", "SV="),
  cleanCols = TRUE,
  silent = FALSE,
  callFrom = NULL,
  debug = FALSE
)

Arguments

filename

(character) names fasta-file to be read

delim

(character) delimeter at header-line

databaseSign

(character) characters at beginning right after the '>' (typically specifying the data-base-origin), they will be excluded from the sequance-header

removeEntries

(character) if 'empty' allows removing entries without any sequence entries; set to 'duplicated' to remove duplicate entries (same sequence and same header)

tableOut

(logical) toggle to return named character-vector or matrix with enhaced parsing of fasta-header. The resulting matrix will contain the comumns 'database','uniqueIdentifier','entryName','proteinName','sequence' and further columns depending on argument UniprSep

UniprSep

(character) separators for further separating entry-fields if tableOut=TRUE, see also UniProt-FASTA-headers

cleanCols

(logical) remove columns with all entries NA, if tableOut=TRUE

silent

(logical) suppress messages

callFrom

(character) allows easier tracking of messages produced

debug

(logical) supplemental messages for debugging

Value

This function returns (depending on parameter tableOut) a) a simple character vector (of sequence) with Uniprot ID as name or b) a matrix with columns: 'database','uniqueIdentifier','entryName','proteinName','sequence' and further columns depending on argument UniprSep

See Also

writeFasta2 for writing as fasta, or for reading scan or read.fasta from the package seqinr

Examples

## Tiny example with common contaminants
path1 <- system.file('extdata',package='wrProteo')
fiNa <-  "conta1.fasta.gz"
fasta1 <- readFasta2(file.path(path1,fiNa))
## now let's read and further separate annotation-fields
fasta2 <- readFasta2(file.path(path1,fiNa),tableOut=TRUE)
str(fasta1)

[Package wrProteo version 1.11.0.1 Index]