readWombatNormFile {wrProteo} | R Documentation |
Read (Normalized) Quantitation Data Files Produced By Wombat At Protein Level
Description
Protein quantification results from Wombat-P using the Bioconductor package Normalizer can be read using this function and relevant information extracted. Input files compressed as .gz can be read as well. The protein abundance values (XIC), peptide counting get extracted. Since protein annotation is not very extensive with this format of data, the function allows reading the initial fasta files (from the directory above the quantitation-results) allowing to extract more protein-annotation (like species). Sample-annotation (if available) can be extracted from sdrf files, which are typically part of the Wombat output, too. The protein abundance values may be normalized using multiple methods (median normalization as default), the determination of normalization factors can be restricted to specific proteins (normalization to bait protein(s), or to invariable matrix of spike-in experiments). The protein annotation data gets parsed to extract specific fields (ID, name, description, species ...). Besides, a graphical display of the distribution of protein abundance values may be generated before and after normalization.
Usage
readWombatNormFile(
fileName,
path = NULL,
quantSoft = "(quant software not specified)",
fasta = NULL,
isLog2 = TRUE,
normalizeMeth = "none",
quantCol = "abundance_",
contamCol = NULL,
pepCountCol = c("number_of_peptides"),
read0asNA = TRUE,
refLi = NULL,
sampleNames = NULL,
extrColNames = c("protein_group"),
specPref = NULL,
remRev = TRUE,
remConta = FALSE,
separateAnnot = TRUE,
gr = NULL,
sdrf = NULL,
suplAnnotFile = NULL,
groupPref = list(lowNumberOfGroups = TRUE),
titGraph = NULL,
wex = 1.6,
plotGraph = TRUE,
silent = FALSE,
debug = FALSE,
callFrom = NULL
)
Arguments
fileName |
(character) name of file to be read (default 'proteinGroups.txt' as typically generated by Compomics in txt folder). Gz-compressed files can be read, too. |
path |
(character) path of file to be read |
quantSoft |
(character) qunatification-software used inside Wombat-P |
fasta |
(logical or character) if |
isLog2 |
(logical) typically data read from Wombat are expected to be |
normalizeMeth |
(character) normalization method, defaults to |
quantCol |
(character or integer) exact col-names, or if length=1 content of |
contamCol |
(character or integer, length=1) which columns should be used for contaminants |
pepCountCol |
(character) pattern to search among column-names for count data (1st entry for 'Razor + unique peptides', 2nd fro 'Unique peptides', 3rd for 'MS.MS.count' (PSM)) |
read0asNA |
(logical) decide if initial quntifications at 0 should be transformed to NA (thus avoid -Inf in log2 results) |
refLi |
(character or integer) custom specify which line of data should be used for normalization, ie which line is main species; if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given |
sampleNames |
(character) custom column-names for quantification data; this argument has priority over |
extrColNames |
(character) column names to be read (1st position: prefix for LFQ quantitation, default 'LFQ.intensity'; 2nd: column name for protein-IDs, default 'Majority.protein.IDs'; 3rd: column names of fasta-headers, default 'Fasta.headers', 4th: column name for number of protein IDs matching, default 'Number.of.proteins') |
specPref |
(character) prefix to identifiers allowing to separate i) recognize contamination database, ii) species of main identifications and iii) spike-in species |
remRev |
(logical) option to remove all protein-identifications based on reverse-peptides |
remConta |
(logical) option to remove all proteins identified as contaminants |
separateAnnot |
(logical) if |
gr |
(character or factor) custom defined pattern of replicate association, will override final grouping of replicates from |
sdrf |
(logical, character, list or data.frame) optional extraction and adding of experimenal meta-data:
if |
suplAnnotFile |
(logical or character) optional reading of supplemental files produced by Compomics; if |
groupPref |
(list) additional parameters for interpreting meta-data to identify structure of groups (replicates), will be passed to |
titGraph |
(character) custom title to plot of distribution of quantitation values |
wex |
(numeric) relative expansion factor of the violin in plot |
plotGraph |
(logical) optional plot vioplot of initial and normalized data (using |
silent |
(logical) suppress messages |
debug |
(logical) additional messages for debugging |
callFrom |
(character) allow easier tracking of messages produced |
Details
By standard workflow of Wombat-P writes the results of each analysis-method/quantification-algorithm as .csv files Meta-data describing the proteins may be available from two sources : a) The 1st column of the Wombat/normalizer output. b) Form the .fasta file in the directory above the analysis/quantiication results of the Wombar-workflow
Meta-data describing the samples and experimental setup may be available from a sdrf-file (from the directory above the analysis/quantiication results)
If available, the meta-data will be examined for determining groups of replicates and
the results thereof can be found in $sampleSetup$levels.
Alternatively, a dataframe formatted like sdrf-files (ie for each sample a separate line, see also function readSdrf
) may be given, too.
This import-function has been developed using Wombat-P version 1.x.
The final output is a list containing these elements: $raw
, $quant
, $annot
, $counts
, $sampleSetup
, $quantNotes
, $notes
, or (if separateAnnot=FALSE
) data.frame
with annotation- and main quantification-content. If sdrf
information has been found, an add-tional list-element setup
will be added containg the entire meta-data as setup$meta
and the suggested organization as setup$lev
.
Value
This function returns a list with $raw
(initial/raw abundance values), $quant
with final normalized quantitations, $annot
(columns ), $counts
an array with 'PSM' and 'NoOfRazorPeptides',
$quantNotes
, $notes
and optional setup
for meta-data from sdrf
; or a data.frame with quantitation and annotation if separateAnnot=FALSE
See Also
read.table
, normalizeThis
) , readProteomeDiscovererFile
; readProlineFile
(and other import-functions), matrixNAinspect
Examples
path1 <- system.file("extdata", package="wrProteo")
# Here we'll load a short/trimmed example file (originating from Compomics)
fiNa <- "tinyWombCompo1.csv.gz"
dataWB <- readWombatNormFile(file=fiNa, path=path1, tit="tiny Wombat/Compomics, Normalized ")
summary(dataWB$quant)