binMS {PepSAVIms} | R Documentation |
Consolidate mass spectrometry observations
Description
Combines mass spectrometry observations that are believed to belong to the
same underlying compound into a single observation. In concept, the data
produced by the mass spectrometer may produce multiple reads for a single
compound; thus, binMS
attempts to recover these underlying compounds
through a binning procedure, described in more detail in Details
.
Usage
binMS(mass_spec, mtoz, charge, mass = NULL, time_peak_reten,
ms_inten = NULL, time_range, mass_range, charge_range, mtoz_diff, time_diff)
Arguments
mass_spec |
Either a For example, suppose that a collection of mass spectrometry intensity
observations has provided data for 50 fractions across 20,000
mass-to-charge values. Then the input for |
mtoz |
A vector of either length 1 or length equal to the number of mass-to-charge values for which mass spectrometry data was collected, and which helps identify the mass-to-charge values for this data in one of several ways. One way to provide the information is to provide a numeric vector where
each entry provides the mass-to-charge value for a corresponding row of
mass spectrometry data. Then the A second way is to provide a single number which specifies the column
index in the A third way is provide a single character string which provides the
column name in the |
charge |
The information for the |
mass |
The information for the mass need not be provided, as it can be
derived using the mass-to-charge and charge information; in this case the
parameter should be given its default, i.e. |
time_peak_reten |
The information for the |
ms_inten |
Either |
time_range |
A length-2 numeric vector specifying the lower bound and upper bound (inclusive) of allowed peak retention time occurance for an observation to be included in the consolidation process. |
mass_range |
A length-2 numeric vector specifying the lower bound and upper bound (inclusive) of allowed mass for an observation to be included in the consolidation process. |
charge_range |
A length-2 numeric vector specifying the lower bound and upper bound (inclusive) of allowed electrical charge state for an observation to be included in the consolidation process. |
mtoz_diff |
A single numerical value such that any two observations with a larger absolute difference between their mass-to-charge values are considered to have originated from different underlying compounds. Two observations with a smaller absolute difference between their mass-to-charge values could potentially be considered to originate from the same underlying compound, contingent on other criteria also being met. Nonnegative values are allowed; such a value has the effect of not consolidating any groups, and consequently reduces the function to a filtering routine only. |
time_diff |
A single numerical value such that any two observations with a larger absolute difference between their peak elution times are considered to have originated from different underlying compounds. Two observations with a smaller absolute difference between their peak elution times could potentially be considered to originate from the same underlying compound, contingent on other criteria also being met. Nonnegative values are allowed; such a value has the effect of not consolidating any groups, and consequently reduces the function to a filtering routine only. |
Details
The algorithm described in what follows attempts to combines mass spectrometry observations that are believed to belong to the same underlying compound into a single observation for each compound. There are two conceptually separate steps.
The first step is as follows. All observations must satisfy each of the following criteria for inclusion in the binning process.
Each observation must have its peak elution time occur during the interval specified by
time_range
Each observation must have a mass that falls within the interval specified by
mass_range
Each observation must have an electrical charge state that falls within the interval specified by
charge_range
Once that a set of observations satisfying the above criteria is obtained, then a second step attempts to combine observations believed to belong to the same underlying compound. The algorithm considers two observations that satisfy each of the following criteria to belong to the same compound.
The absolute difference in Daltons of the mass-to-charge value between the two observations is less the the value specified by
mtoz_diff
The absolute difference of the peak elution time between the two observations is less than the value specified by
time_pr_diff
The electrical charge state must be the same for the two observations
Then the binning algorithm is defined as follows. Consider an observation that satisfies the inclusion criteria; this observation is compaired pairwise with every other observation that satisfies the inclusion criteria. If a pair of observations satisfies the criteria determining them to belong to the same underlying compound then the two observations are merged into a single observation. The two previous compounds are removed from the working set, and the process starts over with the newly created observation. The process repeats until no other observation in the working set meets the criteria determining it to belong to the same underlying compound as that of the current observation; at this point it is considered that all observations belonging to the compound have been found, and the process starts over with a new observation.
The merging process has not yet been defined; it is performed by averaging the mass-to-charge values and peak elution times, and summing the mass spectrometry intensities at each fraction. Although observations are merged pairwise, when multiple observations are combined in a sequence of pairings, the averages are given equal weight for all of the observations. In other words, if a pair of observations are merged, and then a third observation is merged with the new observation created by combining the original two, then the mass-to-charge value and peak elution time values of the new observation are obtained by summing the values for each of the three original observations and dividing by three. The merging process for more than three observations is conducted similarly.
Having described the binning algorithm, it is apparent that there are scenarios in which the order in which observations are merged affects the outcome of the algorithm. Since it seems that a minumum requirement of any binning algorithm is that the algorithm is invariant to the ordering of the observations in the data, this algorithm abides by the following rules. The observations in the data are sorted in increasing order by mass-to-charge value, peak elution time, and electical charge state, respectively. Then when choosing an observation to compare to the rest of the set, we start with the observation at the top of the sort ordering, and compare it one-at-a-time to the other elements in the set according to the same ordering. When a consolidated observation is complete in that no other observation left in the working set satisfies the merging criteria, then this consolidated observation can be removed from consideration for all future merges.
Value
Returns an object of class binMS
which inherits from
msDat
. This object is a list
with elements described
below. The class is equipped with a print
, summary
, and
extractMS
function.
msDatObj
An object of class
msDat
that encapsulates the mass spectrometry data for the consolidated data.summ_info
A list containing information pertaining to the consolidation process; for use by the summary function.
Examples
# Load mass spectrometry data
data(mass_spec)
# Perform consolidation via binMS
bin_out <- binMS(mass_spec = mass_spec,
mtoz = "m/z",
charge = "Charge",
mass = "Mass",
time_peak_reten = "Reten",
ms_inten = NULL,
time_range = c(14, 45),
mass_range = c(2000, 15000),
charge_range = c(2, 10),
mtoz_diff = 0.05,
time_diff = 60)
# print, summary function
bin_out
summary(bin_out)
# Extract consolidated mass spectrometry data as a matrix or msDat object
bin_matr <- extractMS(msObj = bin_out, type = "matrix")
bin_msDat <- extractMS(msObj = bin_out, type = "matrix")