readFasta2 {wrProteo} | R Documentation |
Read File Of Protein Sequences In Fasta Format
Description
Read fasta formatted file (from UniProt) to extract (protein) sequences and name.
If tableOut=TRUE
output may be organized as matrix for separating meta-annotation (eg uniqueIdentifier, entryName, proteinName, GN) in separate columns.
Usage
readFasta2(
filename,
delim = "|",
databaseSign = c("sp", "tr", "generic", "gi"),
removeEntries = NULL,
tableOut = FALSE,
UniprSep = c("OS=", "OX=", "GN=", "PE=", "SV="),
strictSpecPattern = TRUE,
cleanCols = TRUE,
silent = FALSE,
callFrom = NULL,
debug = FALSE
)
Arguments
filename |
(character) names fasta-file to be read |
delim |
(character) delimeter at header-line |
databaseSign |
(character) characters at beginning right after the '>' (typically specifying the data-base-origin), they will be excluded from the sequance-header |
removeEntries |
(character) if |
tableOut |
(logical) toggle to return named character-vector or matrix with enhaced parsing of fasta-header.
The resulting matrix will contain the comumns 'database','uniqueIdentifier','entryName','proteinName','sequence' and further columns depending on argument |
UniprSep |
(character) separators for further separating entry-fields if |
strictSpecPattern |
(logical or character) pattern for recognizing EntryName which is typically preceeding ProteinName (separated by ' '); if |
cleanCols |
(logical) remove columns with all entries NA, if |
silent |
(logical) suppress messages |
callFrom |
(character) allows easier tracking of messages produced |
debug |
(logical) supplemental messages for debugging |
Value
This function returns (depending on argument tableOut
) a simple character vector (of sequences) with (entire) Uniprot annotation as name or
b) a matrix with columns: 'database','uniqueIdentifier','entryName','proteinName','sequence' and further columns depending on argument UniprSep
See Also
writeFasta2
for writing as fasta; for reading scan
or read.fasta
from the package seqinr
Examples
## Tiny example with common contaminants
path1 <- system.file('extdata', package='wrProteo')
fiNa <- "conta1.fasta.gz"
fasta1 <- readFasta2(file.path(path1, fiNa))
## now let's read and further separate annotation-fields
fasta2 <- readFasta2(file.path(path1, fiNa), tableOut=TRUE)
str(fasta1)