R: Read Tabulated Files Exported by DIA-NN At Protein Level

readDiaNNFile {wrProteo}

R Documentation

Read Tabulated Files Exported by DIA-NN At Protein Level

Description

This function allows importing protein identification and quantification results from DIA-NN. Data should be exported as tabulated text (tsv) as protein-groups (pg) to allow import by thus function. Quantification data and other relevant information will be parsed and extracted (similar to the other import-functions from this package). The final output is a list containing as (main) elements: $annot, $raw and $quant, or a data.frame with the quantication data and a part of the annotation if argument separateAnnot=FALSE.

Usage

readDiaNNFile(
  fileName,
  path = NULL,
  normalizeMeth = "median",
  sampleNames = NULL,
  read0asNA = TRUE,
  quantCol = "\\.raw$",
  annotCol = NULL,
  refLi = NULL,
  separateAnnot = TRUE,
  FDRCol = NULL,
  groupPref = list(lowNumberOfGroups = TRUE),
  plotGraph = TRUE,
  titGraph = "DiaNN",
  wex = 1.6,
  specPref = c(conta = "CON_|LYSC_CHICK", mainSpecies = "OS=Homo sapiens"),
  gr = NULL,
  sdrf = NULL,
  suplAnnotFile = FALSE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)

Arguments

`fileName`	(character) name of file to be read
`path`	(character) path of file to be read
`normalizeMeth`	(character) normalization method, defaults to `median`, for more details see `normalizeThis`)
`sampleNames`	(character) custom column-names for quantification data; this argument has priority over `suplAnnotFile`
`read0asNA`	(logical) decide if initial quntifications at 0 should be transformed to NA (thus avoid -Inf in log2 results)
`quantCol`	(character or integer) exact col-names, or if length=1 content of `quantCol` will be used as pattern to search among column-names for $quant using `grep`
`annotCol`	(character) column names to be read/extracted for the annotation section (default c("Accession","Description","Gene","Contaminant","Sum.PEP.Score","Coverage....","X..Peptides","X..PSMs","X..Unique.Peptides", "X..AAs","MW..kDa.") )
`refLi`	(character or integer) custom specify which line of data is main species, if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given
`separateAnnot`	(logical) if `TRUE` output will be organized as list with `$annot`, `$abund` for initial/raw abundance values and `$quant` with final log2 (normalized) quantitations
`FDRCol`	- not used (the argument was kept to remain with the same synthax as the other import functions fo this package)
`groupPref`	(list) additional parameters for interpreting meta-data to identify structure of groups (replicates), will be passed to `readSampleMetaData`. May contain `lowNumberOfGroups=FALSE` for automatically choosing a rather elevated number of groups if possible (defaults to low number of groups, ie higher number of samples per group)
`plotGraph`	(logical or integer) optional plot of type vioplot of initial and normalized data (using `normalizeMeth`); if integer, it will be passed to `layout` when plotting
`titGraph`	(character) custom title to plot of distribution of quantitation values
`wex`	(integer) relative expansion factor of the violin-plot (will be passed to `vioplotW`)
`specPref`	(character or list) define characteristic text for recognizing (main) groups of species (1st for comtaminants - will be marked as 'conta', 2nd for main species- marked as 'mainSpe', and optional following ones for supplemental tags/species - maked as 'species2','species3',...); if list and list-element has multiple values they will be used for exact matching of accessions (ie 2nd of argument `annotCol`)
`gr`	(character or factor) custom defined pattern of replicate association, will override final grouping of replicates from `sdrf` and/or `suplAnnotFile` (if provided)
`sdrf`	(character, list or data.frame) optional extraction and adding of experimenal meta-data: if character, this may be the ID at ProteomeExchange, the second element may give futher indicatations for automatic organization of groups of replicates. Besides, the output from `readSdrf` or a list from `defineSamples` may be provided; if `gr` is provided, `gr` gets priority for grouping of replicates
`suplAnnotFile`	(logical or character) optional reading of supplemental files; however, if `gr` is provided, `gr` gets priority for grouping of replicates; if `character` the respective file-name (relative or absolute path)
`silent`	(logical) suppress messages
`debug`	(logical) additional messages for debugging
`callFrom`	(character) allow easier tracking of messages produced

Details

This function has been developed using DIA-NN version 1.8.x. Note, reading gene-group (gg) files is in priciple possible, but resulting files typically lack protein-identifiers which may be less convenient in later steps of analysis. Thus, it is suggested to rather read protein-group (pg) files.

Using the argument suplAnnotFile it is possible to specify a specific file (or search for default file) to read for extracting file-names as sample-names and other experiment related information.

Value

This function returns a list with $raw (initial/raw abundance values), $quant with final normalized quantitations, $annot, $counts an array with number of peptides, $quantNotes and $notes; or if separateAnnot=FALSE the function returns a data.frame with annotation and quantitation only

Examples

diaNNFi1 <- "tinyDiaNN1.tsv.gz"   
## This file contains much less identifications than one may usually obtain
path1 <- system.file("extdata", package="wrProteo")
## let's define the main species and allow tagging some contaminants
specPref1 <- c(conta="conta|CON_|LYSC_CHICK", mainSpecies="HUMAN")
dataNN <- readDiaNNFile(path1, file=diaNNFi1, specPref=specPref1, tit="Tiny DIA-NN Data")
summary(dataNN$quant)

[Package wrProteo version 1.12.0 Index]