rs.makeDB {RxnSim}R Documentation

Converts Text File to Reaction Database

Description

Reads and parses input text file containing reaction smiles into reaction database object. The reaction database is used for querying reaction similarity of candidate reactions.

Usage

rs.makeDB (txtFile, header = FALSE, sep = '\t', standardize = TRUE, explicitH = FALSE,
          fp.type = 'extended', fp.mode = 'bit', fp.depth = 6, fp.size = 1024,
          useMask = FALSE, maskStructure, mask, recursive = FALSE)

Arguments

txtFile

input file containing EC numbers, reaction name and RSMI. See description for format of input file.

header

boolean to indicate if the input file contains a header. It is set to FALSE by default.

sep

the field separator character to be used while reading the input file.

standardize

suppresses all explicit hydrogen if set as TRUE (default).

explicitH

converts all implicit hydrogen to explicit if set as TRUE. It is set as FALSE by default.

fp.type

Fingerprint type to use. Allowed types include:
'standard', 'extended' (default), 'graph', 'estate', 'hybridization', 'maccs', 'pubchem', 'kr', 'shortestpath', 'signature' and 'circular'.

fp.mode

fingerprint mode to be used. It can either be set to 'bit' (default) or 'count'.

fp.depth

search depth for fingerprint construction. This argument is ignored for 'pubchem', 'maccs', 'kr' and 'estate' fingerprints.

fp.size

length of the fingerprint bit string. This argument is ignored for 'pubchem', 'maccs', 'kr', 'estate', 'circular' (count mode) and 'signature' fingerprints.

useMask

boolean to indicate use of masking. If TRUE, each reaction is processed to mask given substructure. See rs.mask for details.

maskStructure

SMILES or SMARTS of the structure to be searched and masked.

mask

SMILES of structure to be used as mask.

recursive

if TRUE, all the occurrences of input substructure are replaced recursively.

Details

The parameters used to generate fingerprints are stored in the database object and returned with the parsed data. Same parameter values are used while parsing input reaction in rs.compute.DB.

The input text file should contain following three fields, separated with TAB (or any appropriate field separator). A field can be left blank.

[EC Number] [Reaction Name] [Reaction SMILES (RSMI)]

The package comes with a sample reaction database file extracted from Rhea database (Morgat et al., 2015). If no textfile is provided, default sample database file is used:

rs.makeDB()

A larger dataset containing all reactions from Rhea database (v.83) is also provided with the package.

Value

Returns a list, containing parsed input data, reaction fingerprints.

Data

data frame containing EC Numbers, Reaction Names and RSMI as read from the input file. MaskedRSMI are also included if masking is used.

FP

list of molecular fingerprints for each reaction in the input file. These fingerprints are further processed based on the reaction similarity algorithm.

It also contains the parameter values used for generating fingerprints, viz., standardize, explicitH, fp.type, fp.mode, fp.depth and fp.size.

Author(s)

Varun Giri varungiri@gmail.com

References

Morgat, A., Lombardot, T., Axelsen, K., Aimo, L., Niknejad, A., Hyka-Nouspikel, N., Coudert, E., Pozzato, M., Pagni, M., Moretti, S., Rosanoff, S., Onwubiko, J., Bougueleret, L., Xenarios, I., Redaschi, N., Bridge, A. (2017) Updates in Rhea - an expert curated resource of biochemical reactions. Nucleic Acids Research, 45:D415-D418; doi: 10.1093/nar/gkw990

See Also

rs.compute.DB, rs.mask


[Package RxnSim version 1.0.4 Index]