rs.makeDB {RxnSim} | R Documentation |
Converts Text File to Reaction Database
Description
Reads and parses input text file containing reaction smiles into reaction database object. The reaction database is used for querying reaction similarity of candidate reactions.
Usage
rs.makeDB (txtFile, header = FALSE, sep = '\t', standardize = TRUE, explicitH = FALSE,
fp.type = 'extended', fp.mode = 'bit', fp.depth = 6, fp.size = 1024,
useMask = FALSE, maskStructure, mask, recursive = FALSE)
Arguments
txtFile |
input file containing EC numbers, reaction name and RSMI. See description for format of input file. |
header |
boolean to indicate if the input file contains a header. It is set to |
sep |
the field separator character to be used while reading the input file. |
standardize |
suppresses all explicit hydrogen if set as |
explicitH |
converts all implicit hydrogen to explicit if set as |
fp.type |
Fingerprint type to use. Allowed types include: |
fp.mode |
fingerprint mode to be used. It can either be set to |
fp.depth |
search depth for fingerprint construction. This argument is ignored for |
fp.size |
length of the fingerprint bit string. This argument is ignored for |
useMask |
boolean to indicate use of masking. If |
maskStructure |
SMILES or SMARTS of the structure to be searched and masked. |
mask |
SMILES of structure to be used as mask. |
recursive |
if |
Details
The parameters used to generate fingerprints are stored in the database object and returned with the parsed data. Same parameter values are used while parsing input reaction in rs.compute.DB
.
The input text file should contain following three fields, separated with TAB
(or any appropriate field separator). A field can be left blank.
[EC Number] | [Reaction Name] | [Reaction SMILES (RSMI)] |
The package comes with a sample reaction database file extracted from Rhea database (Morgat et al., 2015). If no textfile
is provided, default sample database file is used:
rs.makeDB()
A larger dataset containing all reactions from Rhea database (v.83) is also provided with the package.
Value
Returns a list, containing parsed input data, reaction fingerprints.
Data |
data frame containing EC Numbers, Reaction Names and RSMI as read from the input file. MaskedRSMI are also included if masking is used. |
FP |
list of molecular fingerprints for each reaction in the input file. These fingerprints are further processed based on the reaction similarity algorithm. |
It also contains the parameter values used for generating fingerprints, viz., standardize
, explicitH
, fp.type
, fp.mode
, fp.depth
and fp.size
.
Author(s)
Varun Giri varungiri@gmail.com
References
Morgat, A., Lombardot, T., Axelsen, K., Aimo, L., Niknejad, A., Hyka-Nouspikel, N., Coudert, E., Pozzato, M., Pagni, M., Moretti, S., Rosanoff, S., Onwubiko, J., Bougueleret, L., Xenarios, I., Redaschi, N., Bridge, A. (2017) Updates in Rhea - an expert curated resource of biochemical reactions. Nucleic Acids Research, 45:D415-D418; doi: 10.1093/nar/gkw990