readMaxQuantPeptides {wrProteo} | R Documentation |
Read Peptide Identification and Quantitation Data-Files (peptides.txt) Produced By MaxQuant
Description
Peptide level identification and quantification data produced by MaxQuant can be read using this function and relevant information extracted. Input files compressed as .gz can be read as well. The peptide abundance values (XIC), peptide counting information and sample-annotation (if available) can be extracted, too.
Usage
readMaxQuantPeptides(
path,
fileName = "peptides.txt",
normalizeMeth = "median",
quantCol = "Intensity",
contamCol = "Potential.contaminant",
pepCountCol = "Experiment",
refLi = NULL,
sampleNames = NULL,
extrColNames = c("Sequence", "Proteins", "Leading.razor.protein", "Start.position",
"End.position", "Mass", "Missed.cleavages", "Unique..Groups.", "Unique..Proteins.",
"Charges"),
specPref = c(conta = "conta|CON_|LYSC_CHICK", mainSpecies = "HUMAN"),
remRev = TRUE,
remConta = FALSE,
separateAnnot = TRUE,
gr = NULL,
sdrf = NULL,
suplAnnotFile = NULL,
groupPref = list(lowNumberOfGroups = TRUE),
titGraph = NULL,
wex = 1.6,
plotGraph = TRUE,
silent = FALSE,
debug = FALSE,
callFrom = NULL
)
Arguments
path |
(character) path of file to be read |
fileName |
(character) name of file to be read (default 'peptides.txt' as typically generated by MaxQuant in txt folder). Gz-compressed files can be read, too. |
normalizeMeth |
(character) normalization method (for details see |
quantCol |
(character or integer) exact col-names, or if length=1 content of |
contamCol |
(character or integer, length=1) which columns should be used for contaminants |
pepCountCol |
(character) pattern to search among column-names for count data (defaults to 'Experiment') |
refLi |
(character or integer) custom specify which line of data should be used for normalization, ie which line is main species; if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given |
sampleNames |
(character) custom column-names for quantification data; this argument has priority over |
extrColNames |
(character) column names to be read (1st position: prefix for quantitation, default 'intensity'; 2nd: column name for peptide-IDs, default ) |
specPref |
(character) prefix to identifiers allowing to separate i) recognize contamination database, ii) species of main identifications and iii) spike-in species |
remRev |
(logical) option to remove all peptide-identifications based on reverse-peptides |
remConta |
(logical) option to remove all peptides identified as contaminants |
separateAnnot |
(logical) if |
gr |
(character or factor) custom defined pattern of replicate association, will override final grouping of
replicates from |
sdrf |
(character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange,
the second & third elements may give futher indicatations for automatic organization of groups of replicates.
Besides, the output from |
suplAnnotFile |
(logical or character) optional reading of supplemental files produced by MaxQuant; if |
groupPref |
(list) additional parameters for interpreting meta-data to identify structure of groups (replicates), will be passed to |
titGraph |
(character) custom title to plot |
wex |
(numeric) relative expansion factor of the violin in plot |
plotGraph |
(logical) optional plot vioplot of initial and normalized data (using |
silent |
(logical) suppress messages |
debug |
(logical) additional messages for debugging |
callFrom |
(character) allow easier tracking of messages produced |
Details
The peptide annotation data gets parsed to extract specific fields (ID, name, description, species ...). Besides, a graphical display of the distribution of peptide abundance values may be generated before and after normalization.
MaxQuant is proteomics quantification software provided by the MaxPlanck institute.
By default MaxQuant write the results of each run to the path combined/txt
, from there (only) the files
'peptides.txt' (main quantitation at peptide level), 'summary.txt' and 'parameters.txt' will be used for this function.
Meta-data describing the samples and experimental setup may be available from two sources :
a) The file summary.txt
which gets produced by MaxQuant in the same folder as the main quantification data.
b) Furthermore, meta-data deposited as sdrf
at Pride can be retreived (via the respective github page) when giving
the accession number in argument sdrf
.
Then, the meta-data will be examined for determining groups of replicates and
the results thereof can be found in $sampleSetup$levels.
Alternatively, a dataframe formatted like sdrf-files (ie for each sample a separate line, see also function readSdrf
) may be given.
In tricky cases it is also possible to precise the column-name to use for defining the groups of replicates or the method for automatically choosing
the most suited column via the 2nd value of the argument sdrf
, see also the function defineSamples
(which gets used internally).
Please note, that sdrf is still experimental and only a small fraction of proteomics-data on Pride have been annotated accordingly.
If a valid sdrf is furnished, it's information has priority over the information extracted from the MaxQuant produced file summary.txt.
This function has been developed using MaxQuant versions 1.6.10.x to 2.0.x, the format of the resulting file 'peptides.txt'
is typically well conserved between versions.
The final output is a list containing these elements: $raw
, $quant
, $annot
, $counts
, $sampleSetup
,
$quantNotes
, $notes
, or (if separateAnnot=FALSE
) data.frame
with annotation- and main quantification-content. If sdrf
information has been found, an add-tional list-element setup
will be added containg the entire meta-data as setup$meta
and the suggested organization as setup$lev
.
Value
This function returns a list with $raw
(initial/raw abundance values), $quant
with final normalized quantitations, $annot
(columns ), $counts
an array with 'PSM' and 'NoOfRazorPeptides',
$quantNotes
, $notes
and optional setup
for meta-data from sdrf
; or a data.frame with quantitation and annotation if separateAnnot=FALSE
See Also
read.table
, normalizeThis
), for reading protein level readMaxQuantFile
, readProlineFile
Examples
# Here we'll load a short/trimmed example file (thus not the MaxQuant default name)
MQpepFi1 <- "peptides_tinyMQ.txt.gz"
path1 <- system.file("extdata", package="wrProteo")
specPref1 <- c(conta="conta|CON_|LYSC_CHICK", mainSpecies="YEAST", spec2="HUMAN")
dataMQpep <- readMaxQuantPeptides(path1, file=MQpepFi1, specPref=specPref1,
tit="Tiny MaxQuant Peptides")
summary(dataMQpep$quant)