R: Perform the library search within R

LibrarySearch {mssearchr}

R Documentation

Perform the library search within R

Description

Perform library search using a custom implementation of the Identity (EI Normal) or Similarity (EI Simple) algorithm. Pairwise comparison of two mass spectra is implemented in C.

Usage

LibrarySearch(
  msp_objs_u,
  msp_objs_l,
  algorithm = c("identity_normal", "similarity_simple"),
  n_hits = 100L,
  hitlist_columns = c("formula", "mw", "smiles"),
  mz_min = NULL,
  mz_max = NULL,
  comments = NULL
)

Arguments

`msp_objs_u`, `msp_objs_l`	A list of nested lists. Each nested list is a mass spectrum. Each nested list must contain at least three elements: (1) `name` (a string) - compound name (or short description); (2) `mz` (a numeric/integer vector) - m/z values of mass spectral peaks; (3) `intst` (a numeric/integer vector) - intensities of mass spectral peaks. Letters 'u' and 'l' stand for unknown and library respectively). Mass spectra should be pre-processed using the `PreprocessMassSpectra` function.
`algorithm`	A string. Library search algorithm. Either the Identity EI Normal (`identity_normal`) or Similarity EI Simple (`similarity_simple`) algorithm.
`n_hits`	An integer value. The maximum number of hits (i.e., candidates) to display.
`hitlist_columns`	A character vector. Three columns are always present in the returned hitlist: `name`, `mf` (i.e., the match factor), and `idx` (i.e., the index of the respective library mass spectrum in the `msp_objs_l` list). Some additional columns can be added using the `hitlist_columns` argument (e.g., `cas_no`, `formula`, `inchikey`, etc.). Only scalar values (i.e., an atomic vector of unit length) are allowed.
`mz_min`, `mz_max`	An integer value. Boundaries of the m/z range (all m/z values out of this range are not taken into account when the match factor is calculated).
`comments`	Any R object. Some additional information. It is saved as the 'comments' attribute of the returned list.

Value

Return a list of data frames. Each data frame is a hitlist (i.e., list of possible candidates). Each hitlist always contains three columns: name, mf (i.e., the match factor), and idx (i.e., the index of the respective library mass spectrum in the msp_objs_l list). Additional columns can be extracted using the hitlist_columns argument. Library search options are saved as the library_search_options attribute.

Examples

# Reading the 'alkanes.msp' file
msp_file <- system.file("extdata", "alkanes.msp", package = "mssearchr")

# Pre-processing
msp_objs_u <- PreprocessMassSpectra(ReadMsp(msp_file)) # unknown mass spectra
msp_objs_l <- PreprocessMassSpectra(massbank_alkanes)  # library mass spectra

# Searching using the Identity algorithm
hitlists <- LibrarySearch(msp_objs_u, msp_objs_l,
                          algorithm = "identity_normal", n_hits = 10L,
                          hitlist_columns = c("formula", "smiles", "db_no"))

# Printing a hitlist for the first compound from the 'alkanes.msp' file
print(hitlists[[1]][1:5, ])

#>        name       mf idx formula        smiles                db_no
#> 1  UNDECANE 950.5551  11  C11H24   CCCCCCCCCCC MSBNK-{...}-JP006877
#> 2  UNDECANE 928.4884  72  C11H24   CCCCCCCCCCC MSBNK-{...}-JP005760
#> 3  DODECANE 905.7546  74  C12H26  CCCCCCCCCCCC MSBNK-{...}-JP006878
#> 4 TRIDECANE 891.7862  41  C13H28 CCCCCCCCCCCCC MSBNK-{...}-JP006879
#> 5  DODECANE 885.6247  42  C12H26  CCCCCCCCCCCC MSBNK-{...}-JP005756

[Package mssearchr version 0.1.1 Index]