R: Read Quantitation Data-Files (proteinGroups.txt) Produced...

readMaxQuantFile {wrProteo}

R Documentation

Read Quantitation Data-Files (proteinGroups.txt) Produced From MaxQuant At Protein Level

Description

Protein quantification results from MaxQuant can be read using this function and relevant information extracted. Input files compressed as .gz can be read as well. The protein abundance values (XIC), peptide counting information like number of unique razor-peptides or PSM values and sample-annotation (if available) can be extracted, too. The protein abundance values may be normalized using multiple methods (median normalization as default), the determination of normalization factors can be restricted to specific proteins (normalization to bait protein(s), or to invariable matrix of spike-in experiments). The protein annotation data gets parsed to extract specific fields (ID, name, description, species ...). Besides, a graphical display of the distribution of protein abundance values may be generated before and after normalization.

Usage

readMaxQuantFile(
  path,
  fileName = "proteinGroups.txt",
  normalizeMeth = "median",
  quantCol = "LFQ.intensity",
  contamCol = "Potential.contaminant",
  pepCountCol = c("Razor + unique peptides", "Unique peptides", "MS.MS.count"),
  read0asNA = TRUE,
  refLi = NULL,
  sampleNames = NULL,
  extrColNames = c("Majority.protein.IDs", "Fasta.headers", "Number.of.proteins"),
  specPref = c(conta = "conta|CON_|LYSC_CHICK", mainSpecies = "OS=Homo sapiens"),
  remRev = TRUE,
  remConta = FALSE,
  separateAnnot = TRUE,
  gr = NULL,
  sdrf = NULL,
  suplAnnotFile = NULL,
  groupPref = list(lowNumberOfGroups = TRUE),
  titGraph = NULL,
  wex = 1.6,
  plotGraph = TRUE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)

Arguments

`path`	(character) path of file to be read
`fileName`	(character) name of file to be read (default 'proteinGroups.txt' as typically generated by MaxQuant in txt folder). Gz-compressed files can be read, too.
`normalizeMeth`	(character) normalization method, defaults to `median`, for more details see `normalizeThis`)
`quantCol`	(character or integer) exact col-names, or if length=1 content of `quantCol` will be used as pattern to search among column-names for $quant using `grep`
`contamCol`	(character or integer, length=1) which columns should be used for contaminants
`pepCountCol`	(character) pattern to search among column-names for count data (1st entry for 'Razor + unique peptides', 2nd fro 'Unique peptides', 3rd for 'MS.MS.count' (PSM))
`read0asNA`	(logical) decide if initial quntifications at 0 should be transformed to NA (thus avoid -Inf in log2 results)
`refLi`	(character or integer) custom specify which line of data should be used for normalization, ie which line is main species; if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given
`sampleNames`	(character) custom column-names for quantification data; this argument has priority over `suplAnnotFile`
`extrColNames`	(character) column names to be read (1st position: prefix for LFQ quantitation, default 'LFQ.intensity'; 2nd: column name for protein-IDs, default 'Majority.protein.IDs'; 3rd: column names of fasta-headers, default 'Fasta.headers', 4th: column name for number of protein IDs matching, default 'Number.of.proteins')
`specPref`	(character) prefix to identifiers allowing to separate i) recognize contamination database, ii) species of main identifications and iii) spike-in species
`remRev`	(logical) option to remove all protein-identifications based on reverse-peptides
`remConta`	(logical) option to remove all proteins identified as contaminants
`separateAnnot`	(logical) if `TRUE` output will be organized as list with `$annot`, `$abund` for initial/raw abundance values and `$quant` with final normalized quantitations
`gr`	(character or factor) custom defined pattern of replicate association, will override final grouping of replicates from `sdrf` and/or `suplAnnotFile` (if provided)
`sdrf`	(character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange, the second & third elements may give futher indicatations for automatic organization of groups of replicates. Besides, the output from `readSdrf` or a list from `defineSamples` may be provided; if `gr` is provided, `gr` gets priority for grouping of replicates; if `sdrfOrder=TRUE` the output will be put in order of sdrf
`suplAnnotFile`	(logical or character) optional reading of supplemental files produced by MaxQuant; if `gr` is provided, it gets priority for grouping of replicates if `TRUE` default to files 'summary.txt' (needed to match information of `sdrf`) and 'parameters.txt' which can be found in the same folder as the main quantitation results; if `character` the respective file-names (relative ro absolute path), 1st is expected to correspond to 'summary.txt' (tabulated text, the samples as given to MaxQuant) and 2nd to 'parameters.txt' (tabulated text, all parameters given to MaxQuant)
`groupPref`	(list) additional parameters for interpreting meta-data to identify structure of groups (replicates), will be passed to `readSampleMetaData`. May contain `lowNumberOfGroups=FALSE` for automatically choosing a rather elevated number of groups if possible (defaults to low number of groups, ie higher number of samples per group)
`titGraph`	(character) custom title to plot of distribution of quantitation values
`wex`	(numeric) relative expansion factor of the violin in plot
`plotGraph`	(logical) optional plot vioplot of initial and normalized data (using `normalizeMeth`); alternatively the argument may contain numeric details that will be passed to `layout` when plotting
`silent`	(logical) suppress messages
`debug`	(logical) additional messages for debugging
`callFrom`	(character) allow easier tracking of messages produced

Details

MaxQuant is proteomics quantification software provided by the MaxPlanck institute. By default MaxQuant writes the results of each run to the path combined/txt, from there (only) the files 'proteinGroups.txt' (main quantitation at protein level), 'summary.txt' and 'parameters.txt' will be used.

Meta-data describing the samples and experimental setup may be available from two sources : a) The file summary.txt which gets produced by MaxQuant in the same folder as the main quantification data. b) Furthermore, meta-data deposited as sdrf at Pride can be retreived (via the respective github page) when giving the accession number in argument sdrf. Then, the meta-data will be examined for determining groups of replicates and the results thereof can be found in $sampleSetup$levels. Alternatively, a dataframe formatted like sdrf-files (ie for each sample a separate line, see also function readSdrf) may be given. In tricky cases it is also possible to precise the column-name to use for defining the groups of replicates or the method for automatically choosing the most suited column via the 2nd value of the argument sdrf. Please note, that sdrf is still experimental and only a small fraction of proteomics-data on Pride have been annotated accordingly. If a valid sdrf is furnished, it's information has priority over the information extracted from the MaxQuant produced file summary.txt.

This import-function has been developed using MaxQuant versions 1.6.10.x to 2.0.x, the format of the resulting file 'proteinGroups.txt' is typically well conserved between versions. The final output is a list containing these elements: $raw, $quant, $annot, $counts, $sampleSetup, $quantNotes, $notes, or (if separateAnnot=FALSE) data.frame with annotation- and main quantification-content. If sdrf information has been found, an add-tional list-element setup will be added containg the entire meta-data as setup$meta and the suggested organization as setup$lev.

Value

This function returns a list with $raw (initial/raw abundance values), $quant with final normalized quantitations, $annot (columns ), $counts an array with 'PSM' and 'NoOfRazorPeptides', $quantNotes, $notes and optional setup for meta-data from sdrf; or a data.frame with quantitation and annotation if separateAnnot=FALSE

Examples

path1 <- system.file("extdata", package="wrProteo")
# Here we'll load a short/trimmed example file (thus not the MaxQuant default name)
fiNa <- "proteinGroupsMaxQuant1.txt.gz"
specPr <- c(conta="conta|CON_|LYSC_CHICK", mainSpecies="YEAST", spike="HUMAN_UPS")
dataMQ <- readMaxQuantFile(path1, file=fiNa, specPref=specPr, tit="tiny MaxQuant")
summary(dataMQ$quant)
matrixNAinspect(dataMQ$quant, gr=gl(3,3))

[Package wrProteo version 1.12.0 Index]