R: Group Metabolites based on Pooled Plasma Missing Rate

subset_met {MetProc}

R Documentation

Group Metabolites based on Pooled Plasma Missing Rate

Description

Separates metabolites into groups based on pooled plasma missing rates so that different thresholds of metrics can be applied to each group.

Usage

subset_met(df, miss, numsplit = 5, mincut = 0.02, maxcut = 0.95)

Arguments

`df`	The metabolomics dataset, ideally read from the `read.met` function. Each column represents a sample and each row represents a metabolite. Columns should be labeled with some unique prefix denoting whether the column is from a biological sample or pooled plasma sample. For example, all pooled plasma samples may have columns identified by the prefix “PPP” and all biological samples may have columns identified by the prefix “X”. Missing data must be coded as NA. Columns must be ordered by injection order.
`miss`	Vector of missing rates of equal length to number of rows in `df` representing the pooled plasma missing rate for each metabolite.
`numsplit`	The number of equal sized sections to divide metabolites into based on missing rate of pooled plasma columns. Divides the range of missing rates between `mincut` and `maxcut` into equal sections. Default is `5`.
`mincut`	A cutoff to specify that any metabolite with pooled plasma missing rate less than or equal to this value should be retained. Default is `0.02`.
`maxcut`	A cutoff to specify that any metabolite with pooled plasma missing rate greater than this values should be removed. Default is `0.95`.

Value

A list consisting of a number of elements equal to numsplit. Each element contains a matrix of the given metabolite group based on the pooled plasma missing rate. The list keys are simple integers corresponding to the split number.

Examples

library(MetProc)

#Read in metabolomics data
metdata <- read.met(system.file("extdata/sampledata.csv", package="MetProc"),
headrow=3, metidcol=1, fvalue=8, sep=",", ppkey="PPP", ippkey="BPP")

#Get indices of pooled plasma and samples
groups <- get_group(metdata,"PPP","X")

#Calculate a pooled plasma missing rate and sample missing rate
#for each metabolite in data
missrate <- get_missing(metdata,groups[['pp']],groups[['sid']])

#Group metabolites into 5 groups based on pooled plasma
#missing rate
subsets <- subset_met(metdata,missrate[['ppmiss']],5,.02,.95)