LibrarySearch {mssearchr}R Documentation

Perform the library search within R

Description

Perform library search using a custom implementation of the Identity (EI Normal) or Similarity (EI Simple) algorithm. Pairwise comparison of two mass spectra is implemented in C.

Usage

LibrarySearch(
  msp_objs_u,
  msp_objs_l,
  algorithm = c("identity_normal", "similarity_simple"),
  n_hits = 100L,
  hitlist_columns = c("formula", "mw", "smiles"),
  mz_min = NULL,
  mz_max = NULL,
  comments = NULL
)

Arguments

msp_objs_u, msp_objs_l

A list of nested lists. Each nested list is a mass spectrum. Each nested list must contain at least three elements: (1) name (a string) - compound name (or short description); (2) mz (a numeric/integer vector) - m/z values of mass spectral peaks; (3) intst (a numeric/integer vector) - intensities of mass spectral peaks. Letters 'u' and 'l' stand for unknown and library respectively). Mass spectra should be pre-processed using the PreprocessMassSpectra function.

algorithm

A string. Library search algorithm. Either the Identity EI Normal (identity_normal) or Similarity EI Simple (similarity_simple) algorithm.

n_hits

An integer value. The maximum number of hits (i.e., candidates) to display.

hitlist_columns

A character vector. Three columns are always present in the returned hitlist: name, mf (i.e., the match factor), and idx (i.e., the index of the respective library mass spectrum in the msp_objs_l list). Some additional columns can be added using the hitlist_columns argument (e.g., cas_no, formula, inchikey, etc.). Only scalar values (i.e., an atomic vector of unit length) are allowed.

mz_min, mz_max

An integer value. Boundaries of the m/z range (all m/z values out of this range are not taken into account when the match factor is calculated).

comments

Any R object. Some additional information. It is saved as the 'comments' attribute of the returned list.

Value

Return a list of data frames. Each data frame is a hitlist (i.e., list of possible candidates). Each hitlist always contains three columns: name, mf (i.e., the match factor), and idx (i.e., the index of the respective library mass spectrum in the msp_objs_l list). Additional columns can be extracted using the hitlist_columns argument. Library search options are saved as the library_search_options attribute.

Examples

# Reading the 'alkanes.msp' file
msp_file <- system.file("extdata", "alkanes.msp", package = "mssearchr")

# Pre-processing
msp_objs_u <- PreprocessMassSpectra(ReadMsp(msp_file)) # unknown mass spectra
msp_objs_l <- PreprocessMassSpectra(massbank_alkanes)  # library mass spectra

# Searching using the Identity algorithm
hitlists <- LibrarySearch(msp_objs_u, msp_objs_l,
                          algorithm = "identity_normal", n_hits = 10L,
                          hitlist_columns = c("formula", "smiles", "db_no"))

# Printing a hitlist for the first compound from the 'alkanes.msp' file
print(hitlists[[1]][1:5, ])

#>        name       mf idx formula        smiles                db_no
#> 1  UNDECANE 950.5551  11  C11H24   CCCCCCCCCCC MSBNK-{...}-JP006877
#> 2  UNDECANE 928.4884  72  C11H24   CCCCCCCCCCC MSBNK-{...}-JP005760
#> 3  DODECANE 905.7546  74  C12H26  CCCCCCCCCCCC MSBNK-{...}-JP006878
#> 4 TRIDECANE 891.7862  41  C13H28 CCCCCCCCCCCCC MSBNK-{...}-JP006879
#> 5  DODECANE 885.6247  42  C12H26  CCCCCCCCCCCC MSBNK-{...}-JP005756


[Package mssearchr version 0.1.1 Index]