R: Read Sample Meta-data from Quantification-Software And/Or...

readSampleMetaData {wrProteo}

R Documentation

Read Sample Meta-data from Quantification-Software And/Or Sdrf And Align To Experimental Data

Description

Sample/experimental annotation meta-data form MaxQuant, ProteomeDiscoverer, FragPipe, Proline or similar, can be read using this function and relevant information extracted. Furthermore, annotation in sdrf-format can be added (the order of sdrf will be adjated automatically, if possible). This functions returns a list with grouping of samples into replicates and additional information gathered. Input files compressed as .gz can be read as well.

Usage

readSampleMetaData(
  quantMeth,
  sdrf = NULL,
  suplAnnotFile = NULL,
  path = ".",
  abund = NULL,
  groupPref = list(lowNumberOfGroups = TRUE, sampleNames = NULL, gr = NULL),
  chUnit = TRUE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)

Arguments

`quantMeth`	(character, length=1) quantification method used; 2-letter abbreviations like 'MQ','PD','PL','FP' etc may be used
`sdrf`	(character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange or a similarly formatted local file. `sdrf` will get priority over `suplAnnotFile`, if provided.
`suplAnnotFile`	(logical or character) optional reading of supplemental files produced by MaxQuant; if `gr` is provided, it gets priority for grouping of replicates if `TRUE` in case of `method=='MQ'` (MaxQuant) default to files 'summary.txt' (needed to match information of `sdrf`) and 'parameters.txt' which can be found in the same folder as the main quantitation results; if `character` the respective file-names (relative ro absolute path), 1st is expected to correspond to 'summary.txt' (tabulated text, the samples as given to MaxQuant) and 2nd to 'parameters.txt' (tabulated text, all parameters given to MaxQuant) in case of `method=='PL'` (Proline), this argument should contain the initial file-name (for the identification and quantification data) in the first position
`path`	(character) optional path of file(s) to be read
`abund`	(matrix or data.frame) experimental quantitation data; only column-names will be used for aligning order of annotated samples
`groupPref`	(list) additional parameters for interpreting meta-data to identify structure of groups (replicates); May contain `lowNumberOfGroups=FALSE` for automatically choosing a rather elevated number of groups if possible (defaults to low number of groups, ie higher number of samples per group). A vector of custom sample-names may be provided via `sampleNames=...` (must be of correct length); if contains `sampleNames="sdrf"` sample-names will be used from trimmed file-names.
`chUnit`	(logical or character) optional adjustig of group-labels from sample meta-data in case multipl different unit-prefixes are used to single common prefix (eg adjust '100pMol' and '1nMol' to '100pMol' and '1000pMol') for better downstream analysis. This option will call `adjustUnitPrefix` and `checkUnitPrefix` from package `wrMisc` If `character` exatecly this/these unit-names will be searched in sample-names and checked if multiple different decimal prefixes are used; if `TRUE` the default set of unit-names ('Mol','mol', 'days','day','m','sec','s','h') will be checked in the sample-names for different decimal prefixes
`silent`	(logical) suppress messages if `TRUE`
`debug`	(logical) additional messages for debugging
`callFrom`	(character) allow easier tracking of messages produced

Details

When initally reading/importing quantitation data, typically very little is known about the setup of different samples in the underlying experiment. The overall aim is to read and mine the corresponding sample-annotation documeneted by the quantitation-software and/or from n sdrf repository and to attach it to the experimental data. This way, in subsequent steps of analysis (eg PCA, statictical tests) the user does not have to bother stuying the experimental setup to figure out which samples should be considered as relicate of whom.

Sample annotation meta-data can be obtained from two sources : a) form additional files produced (and exported) by the initial quantitation software (so far MaxQuant and ProteomeDiscoverer have een implemeneted) or b) from the universal sdrf-format (from Pride or user-supplied). Both types can be imported and checked in the same run, if valid sdrf-information is found this will be given priority. For more information about the sdrf format please see sdrf on github.

Value

This function returns a list with $level (grouping of samples given as integer), and $meth (method by which grouping as determined). If valid sdrf was given, the resultant list contains in addition $sdrfDat (data.frame of annotation). Alternatively it may contain a $sdrfExport if sufficient information has been gathered (so far only for MaxQuant) for a draft sdrf for export (that should be revised and completed by the user). If software annotation has been found it will be shown in $annotBySoft. If all entries are invalid or entries do not pass the tests, this functions returns an empty list.

Examples

sdrf001819Setup <- readSampleMetaData(quantMeth=NA, sdrf="PXD001819")
str(sdrf001819Setup)

[Package wrProteo version 1.12.0 Index]