met_proc {MetProc} | R Documentation |
Separates Metabolites into Likely True Metabolites and Likely Measurement Artifacts
Description
Takes a metabolomics data matrix and processes metabolites into likely artifacts versus likely true metabolites. Biological samples should follow a randomized injection order with pooled plasma samples interspersed. Columns of data should be samples and rows are metabolites. Columns must be ordered by injection order. Metabolites are first grouped by missing rate of pooled plasma and then processed based on metrics of blocky structure to identify likely artifacts. Specifically, corr_metric
and run_metric
are used to quantify the degree to which structure is present in the patterns of missing data. Must pass all thresholds to be considered a true metabolite.
Usage
met_proc(df, numsplit = 5, cor_rates = c(0.6, 0.65, 0.65, 0.65, 0.6),
runlengths = c(NA, 15, 15, 15, NA), mincut = 0.02, maxcut = 0.95, scut = 0.5,
ppkey = "PPP", sidkey = "X", missratecut=0.01, histcolors=c('white'), plot=TRUE,
outfile='MetProc_output')
Arguments
df |
The metabolomics dataset, ideally read from the |
numsplit |
The number of equal sized sections to divide metabolites into based on missing rate of pooled plasma columns. Divides the range of missing rates between |
cor_rates |
A vector of length equal to |
runlengths |
A vector of length equal to |
mincut |
A cutoff to specify that any metabolite with pooled plasma missing rate less than or equal to this value should be retained. Default is |
maxcut |
A cutoff to specify that any metabolite with pooled plasma missing rate greater than this value should be removed. Default is |
scut |
The cutoff of missingness to consider a metabolite as having data present in a given biological sample block. Relevant only to |
ppkey |
The unique prefix of pooled plasma columns. Default is |
sidkey |
The unique prefix of biological samples columns. Default is |
missratecut |
A parameter for heatmap plots when |
plot |
Indicate whether you would like to obtain plots of missingness patterns and distributions of calculated metrics. Plots will be output as a PDF. Default is |
histcolors |
A vector of length equal to |
outfile |
Name and path of the file to store images if |
Details
The function uses a four step process:
1. Retain all metabolites with pooled plasma missing rate below mincut
and remove all metabolites with pooled plasma missing rate above maxcut
.
2. Split the remaining metabolites into numsplit
groups that are defined by pooled plasma missing rates. The numsplit
groups will divide the range of pooled plasma missing rates evenly.
3. For each group of metabolites based on pooled plasma missing rates from step 2, calculate the correlation metric with corr_metric
. Any metabolite below the cutoff for that group, defined by cor_rates
, will be retained and any metabolite above will be removed.
4. For each group of metabolites based on pooled plasma missing rates from step 2, calculate the longest run metric with run_metric
. Any metabolite below the cutoff for that group, defined by runlengths
, will be retained and any metabolite above will be removed.
Value
keep |
A dataframe of the retained metabolites |
remove |
A dataframe of the removed metabolites |
If plot = True
, a PDF file will be saved containing the correspondence between pooled plasma missing rate and sample missing rate, the distribution of the correlation metric and longest run metric in each of the groups based on pooled plasma missing rates, and heatmaps displaying the patterns of present/missing data for both the removed and retained metabolites.
See Also
See run_metric
for details on the longest run metric.
See corr_metric
for details on the correlation metric.
See MetProc-package
for examples of running the full process.
Examples
library(MetProc)
#Read in metabolomics data
metdata <- read.met(system.file("extdata/sampledata.csv", package="MetProc"),
headrow=3, metidcol=1, fvalue=8, sep=",", ppkey="PPP", ippkey="BPP")
#Separate likely artifacts from true signal using default settings
results <- met_proc(metdata,plot=FALSE)
#Separate likely artifacts from true signal using custom cutoffs and criteria
#Uses 5 groups of metabolites based on the pooled plasma missing rate, applies
#custom metric thersholds, sets the minimum pooled plasma missing rate to 0.05,
#sets the maximum pooled plasma missing rate to 0.95, sets the missing rate
#to consider a block of samples present at 0.6
results <- met_proc(metdata, numsplit = 5, cor_rates = c(0.4,.7,.75,.7,.4),
runlengths = c(80, 10, 12, 10, 80), mincut = 0.05, maxcut = 0.95, scut = 0.6,
ppkey = 'PPP', sidkey = 'X', plot = FALSE)
#Uses default criteria for running met_proc, but plots the results
#and saves them in a PDF in the current directory.
#Colors of the histograms set by histcolors.
#Adding plots may substantially increase running time if many
#samples are included
results <- met_proc(metdata, plot = TRUE, missratecut = 0.001,
histcolors = c('red','yellow','green','blue','purple'))