R: Read Tabulated Files Exported By ProteomeDiscoverer At...

readProteomeDiscovererFile {wrProteo}

R Documentation

Read Tabulated Files Exported By ProteomeDiscoverer At Protein Level

Description

Protein identification and quantification results from Thermo ProteomeDiscoverer which were exported as tabulated text can be imported and relevant information extracted.

Usage

readProteomeDiscovererFile(
  fileName,
  path = NULL,
  normalizeMeth = "median",
  sampleNames = NULL,
  read0asNA = TRUE,
  quantCol = "^Abundance",
  annotCol = NULL,
  contamCol = "Contaminant",
  refLi = NULL,
  separateAnnot = TRUE,
  FDRCol = list(c("^Protein.FDR.Confidence", "High"), c("^Found.in.Sample.", "High")),
  gr = NULL,
  sdrf = NULL,
  suplAnnotFile = TRUE,
  groupPref = list(lowNumberOfGroups = TRUE),
  specPref = c(conta = "CON_|LYSC_CHICK", mainSpecies = "OS=Homo sapiens"),
  plotGraph = TRUE,
  wex = 1.6,
  titGraph = "Proteome Discoverer",
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)

Arguments

`fileName`	(character) name of file to be read
`path`	(character) path of file to be read
`normalizeMeth`	(character) normalization method, defaults to `median`, for more details see `normalizeThis`)
`sampleNames`	(character) custom column-names for quantification data (ProteomeDiscoverer does not automatically use file-names from spectra); this argument has priority over `suplAnnotFile`
`read0asNA`	(logical) decide if initial quntifications at 0 should be transformed to NA
`quantCol`	(character or integer) define ywhich columns should be extracted as quantitation data : The argument may be the exact column-names to be used, or if length=1 content of `quantCol` will be used as pattern to search among column-names for $quant using `grep`; if `quantCol='allAfter_calc.pI'` all columns to the right of the column 'calc.pI' will be interpreted as quantitation data (may be useful with files that have been manually edited before passing to wrProteo)
`annotCol`	(character) column names to be read/extracted for the annotation section (default c("Accession","Description","Gene","Contaminant","Sum.PEP.Score","Coverage....","X..Peptides","X..PSMs","X..Unique.Peptides", "X..AAs","MW..kDa.") )
`contamCol`	(character or integer, length=1) which columns should be used for contaminants marked by ProteomeDiscoverer. If a column named `contamCol` is found, the data will be lateron filtered to remove all contaminants, set to `NULL` for keeping all contaminants
`refLi`	(character or integer) custom specify which line of data is main species, if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given
`separateAnnot`	(logical) if `TRUE` output will be organized as list with `$annot`, `$abund` for initial/raw abundance values and `$quant` with final log2 (normalized) quantitations
`FDRCol`	(list) optional indication to search for protein FDR information
`gr`	(character or factor) custom defined pattern of replicate association, will override final grouping of replicates from `sdrf` and/or `suplAnnotFile` (if provided)
`sdrf`	(character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange, the second & third elements may give futher indicatations for automatic organization of groups of replicates. Besides, the output from `readSdrf` or a list from `defineSamples` may be provided; if `gr` is provided, `gr` gets priority for grouping of replicates; if `sdrfOrder=TRUE` the output will be put in order of sdrf
`suplAnnotFile`	(logical or character) optional reading of supplemental files produced by ProteomeDiscoverer; however, if `gr` is provided, `gr` gets priority for grouping of replicates; if `TRUE` defaults to file '*InputFiles.txt' (needed to match information of `sdrf`) which can be exported next to main quantitation results; if `character` the respective file-name (relative or absolute path)
`groupPref`	(list) additional parameters for interpreting meta-data to identify structure of groups (replicates), will be passed to `readSampleMetaData`. May contain `lowNumberOfGroups=FALSE` for automatically choosing a rather elevated number of groups if possible (defaults to low number of groups, ie higher number of samples per group)
`specPref`	(character or list) define characteristic text for recognizing (main) groups of species (1st for comtaminants - will be marked as 'conta', 2nd for main species- marked as 'mainSpe', and optional following ones for supplemental tags/species - maked as 'species2','species3',...); if list and list-element has multiple values they will be used for exact matching of accessions (ie 2nd of argument `annotCol`)
`plotGraph`	(logical or integer) optional plot of type vioplot of initial and normalized data (using `normalizeMeth`); if integer, it will be passed to `layout` when plotting
`wex`	(integer) relative expansion factor of the violin-plot (will be passed to `vioplotW`)
`titGraph`	(character) custom title to plot of distribution of quantitation values
`silent`	(logical) suppress messages
`debug`	(logical) additional messages for debugging
`callFrom`	(character) allow easier tracking of messages produced

Details

This function has been developed using Thermo ProteomeDiscoverer versions 2.2 to 2.5. The format of resulting files at export also depends which columns are chosen as visible inside ProteomeDiscoverer and subsequently get chosen for export. Using the argument suplAnnotFile it is possible to specify a specific file (or search for default file) to read for extracting file-names as sample-names and other experiment realted information. If a column named contamCol is found, the data will be lateron filtered to remove all contaminants, set to NULL for keeping all contaminants.

The final output is a list containing as (main) elements: $annot, $raw and optional $quant, or returns data.frame with entire content of file if separateAnnot=FALSE.

This function replaces the depreciated function readProtDiscovFile.

Value

This function returns a list with $raw (initial/raw abundance values), $quant with final normalized quantitations, $annot, $counts an array with number of peptides, $quantNotes and $notes; or if separateAnnot=FALSE the function returns a data.frame with annotation and quantitation only

Examples

path1 <- system.file("extdata", package="wrProteo")
fiNa <- "tinyPD_allProteins.txt.gz"
dataPD <- readProteomeDiscovererFile(file=fiNa, path=path1, suplAnnotFile=FALSE)
summary(dataPD$quant)

[Package wrProteo version 1.12.0 Index]