ReadMsp {mssearchr} | R Documentation |
Read mass spectra from an msp-file (NIST format)
Description
Read an msp-file containing mass spectra in the NIST format. The complete description of the format can be found in the NIST Mass Spectral Search Program manual. A summary is presented below in the "Description of the NIST format" section.
Usage
ReadMsp(input_file)
Arguments
input_file |
A string. The name of a file. |
Details
Data from an msp-file are read without any modification (e.g., the order of mass values is not changed, zero-intensity peaks are preserved, etc.).
Value
Return a list of nested lists. Each nested list is a mass spectrum. Almost
all metadata fields (e.g., "Name", "CAS#", "Formula", "MW", etc.) are
represented as strings. All "Synon" fields are merged into a single
character vector. Mass values and intensities are represented as numeric
vectors (mz
and intst
). Names of fields are slightly
modified:
names are converted to lowercase;
hash symbols are replaced with
_no
;any other special character is replaced with an underscore character.
Description of the NIST format
The summary was prepared using the NIST Mass Spectral Search Program manual v.2.4 (2020).
An msp-file can contain as many spectra as wanted.
Each spectrum must start with the "Name" field. There must be something in this field.
The "Num Peaks" field is also required. It must contain the number of mass/intensity pairs.
Some optional fields (e.g. "Comments", "Formula", "MW") can be between the "Name" and "Num Peaks" fields.
When a spectrum is exported from the NIST library it also contains the "NIST#" and "DB#" fields. The "NIST#" field is on the same line as the "CAS#" field and separated by a semicolon.
Each field should be on a separate line (the "NIST#" field is an exception from this rule)
The mass/intensity list begins on the line following the "Num Peaks" field. The peaks need not be normalized, and the masses need not be ordered. The exact spacing and delimiters used for the mass/intensity pairs are unimportant. The following characters are accepted as delimiters: '
space
', 'tab
', ',
', ';
', ':
'. Parentheses, square brackets and curly braces ('(
', '(
', '[
', ']
', '{
', and '}
') are also allowed.The "Name" field can be up to 511 characters.
The "Comments" field can be up to 1023 characters.
The "Formula" field can be up to 23 characters.
The "Synon" field may be repeated.
Examples
# Reading the 'alkanes.msp' file
msp_file <- system.file("extdata", "alkanes.msp", package = "mssearchr")
msp_objs <- ReadMsp(msp_file)
# Plotting the first mass spectrum from the 'msp_objs' list
par_old <- par(yaxs = "i")
plot(msp_objs[[1]]$mz, msp_objs[[1]]$intst,
ylim = c(0, 1000), main = msp_objs[[1]]$name,
type = "h", xlab = "m/z", ylab = "Intensity", bty = "l")
par(par_old)