readSampleMetaData {wrProteo}R Documentation

Read Sample Meta-data from Quantification-Software And/Or Sdrf And Align To Experimental Data

Description

Sample/experimental annotation meta-data form MaxQuant, ProteomeDiscoverer, FragPipe, Proline or similar, can be read using this function and relevant information extracted. Furthermore, annotation in sdrf-format can be added (the order of sdrf will be adjated automatically, if possible). This functions returns a list with grouping of samples into replicates and additional information gathered. Input files compressed as .gz can be read as well.

Usage

readSampleMetaData(
  quantMeth,
  sdrf = NULL,
  suplAnnotFile = NULL,
  path = ".",
  abund = NULL,
  groupPref = list(lowNumberOfGroups = TRUE, sampleNames = NULL, gr = NULL),
  chUnit = TRUE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)

Arguments

quantMeth

(character, length=1) quantification method used; 2-letter abbreviations like 'MQ','PD','PL','FP' etc may be used

sdrf

(character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange or a similarly formatted local file. sdrf will get priority over suplAnnotFile, if provided.

suplAnnotFile

(logical or character) optional reading of supplemental files produced by MaxQuant; if gr is provided, it gets priority for grouping of replicates if TRUE in case of method=='MQ' (MaxQuant) default to files 'summary.txt' (needed to match information of sdrf) and 'parameters.txt' which can be found in the same folder as the main quantitation results; if character the respective file-names (relative ro absolute path), 1st is expected to correspond to 'summary.txt' (tabulated text, the samples as given to MaxQuant) and 2nd to 'parameters.txt' (tabulated text, all parameters given to MaxQuant) in case of method=='PL' (Proline), this argument should contain the initial file-name (for the identification and quantification data) in the first position

path

(character) optional path of file(s) to be read

abund

(matrix or data.frame) experimental quantitation data; only column-names will be used for aligning order of annotated samples

groupPref

(list) additional parameters for interpreting meta-data to identify structure of groups (replicates); May contain lowNumberOfGroups=FALSE for automatically choosing a rather elevated number of groups if possible (defaults to low number of groups, ie higher number of samples per group). A vector of custom sample-names may be provided via sampleNames=... (must be of correct length); if contains sampleNames="sdrf" sample-names will be used from trimmed file-names.

chUnit

(logical or character) optional adjustig of group-labels from sample meta-data in case multipl different unit-prefixes are used to single common prefix (eg adjust '100pMol' and '1nMol' to '100pMol' and '1000pMol') for better downstream analysis. This option will call adjustUnitPrefix and checkUnitPrefix from package wrMisc If character exatecly this/these unit-names will be searched in sample-names and checked if multiple different decimal prefixes are used; if TRUE the default set of unit-names ('Mol','mol', 'days','day','m','sec','s','h') will be checked in the sample-names for different decimal prefixes

silent

(logical) suppress messages if TRUE

debug

(logical) additional messages for debugging

callFrom

(character) allow easier tracking of messages produced

Details

When initally reading/importing quantitation data, typically very little is known about the setup of different samples in the underlying experiment. The overall aim is to read and mine the corresponding sample-annotation documeneted by the quantitation-software and/or from n sdrf repository and to attach it to the experimental data. This way, in subsequent steps of analysis (eg PCA, statictical tests) the user does not have to bother stuying the experimental setup to figure out which samples should be considered as relicate of whom.

Sample annotation meta-data can be obtained from two sources : a) form additional files produced (and exported) by the initial quantitation software (so far MaxQuant and ProteomeDiscoverer have een implemeneted) or b) from the universal sdrf-format (from Pride or user-supplied). Both types can be imported and checked in the same run, if valid sdrf-information is found this will be given priority. For more information about the sdrf format please see sdrf on github.

Value

This function returns a list with $level (grouping of samples given as integer), and $meth (method by which grouping as determined). If valid sdrf was given, the resultant list contains in addition $sdrfDat (data.frame of annotation). Alternatively it may contain a $sdrfExport if sufficient information has been gathered (so far only for MaxQuant) for a draft sdrf for export (that should be revised and completed by the user). If software annotation has been found it will be shown in $annotBySoft. If all entries are invalid or entries do not pass the tests, this functions returns an empty list.

See Also

this function is used internally by readMaxQuantFile,/link{readProteomeDiscovererFile} etc; uses readSdrf for reading sdrf-files, replicateStructure for mining annotation columns

Examples

sdrf001819Setup <- readSampleMetaData(quantMeth=NA, sdrf="PXD001819")
str(sdrf001819Setup)


[Package wrProteo version 1.12.0 Index]