rs.compute {RxnSim} | R Documentation |
Computes Similarity of Reactions
Description
Computes similarity between two (or more) input reactions.
rs.compute
computes similarity of two reactions.
rs.compute.list
computes similarity of two lists of reactions.
rs.compute.sim.matrix
computes similarity of reactions in a list.
rs.compute.DB
computes similarity of a reaction against a database (parsed from text file).
Usage
rs.compute (rxnA, rxnB, format = 'rsmi', standardize = TRUE, explicitH = FALSE,
reversible = TRUE, algo = 'msim', sim.method = 'tanimoto',
fp.type = 'extended', fp.mode = 'bit', fp.depth = 6, fp.size = 1024,
verbose = FALSE, fpCached = FALSE)
rs.compute.list (rxnA, rxnB, format = 'rsmi', standardize = TRUE, explicitH = FALSE,
reversible = TRUE, algo = 'msim', sim.method = 'tanimoto',
fp.type = 'extended', fp.mode = 'bit',fp.depth = 6, fp.size = 1024,
clearCache = TRUE)
rs.compute.sim.matrix (rxnA, format = 'rsmi', standardize = TRUE, explicitH = FALSE,
reversible = TRUE, algo = 'msim', sim.method = 'tanimoto',
fp.type = 'extended', fp.mode = 'bit', fp.depth = 6, fp.size = 1024,
clearCache = TRUE)
rs.compute.DB (rxnA, DB, format = 'rsmi', ecrange = '*', reversible = TRUE,
algo = 'msim', sim.method = 'tanimoto', sort = TRUE, fpCached = FALSE)
Arguments
rxnA |
input reaction in RSMI format or name (with path) of MDL RXN file. |
rxnB |
input reaction in RSMI format or name (with path) of MDL RXN file. |
DB |
parsed database object as returned by |
format |
specifies format of input reaction(s). Reaction(s) can be provided in one of following formats: 'RSMI' (default) or 'RXN'. |
ecrange |
EC number(s) search pattern while comparing against reaction DB. * is used as wildcard. E.g., 1.2.1.* will restricted search to all reactions with EC numbers starting with 1.2.1.- . |
standardize |
suppresses all explicit hydrogen if set as |
explicitH |
converts all implicit hydrogen to explicit if set as |
reversible |
boolean that indicates reversibility of input reaction(s). If set as |
algo |
reaction similarity algorithm to be used. One of following algorithms can be used: |
sim.method |
similarity metric to be used to evaluate reaction similarity. Allowed types include: |
fp.type |
fingerprint type to use. Allowed types include: |
fp.mode |
fingerprint mode to be used. It can either be set to |
fp.depth |
search depth for fingerprint construction. This argument is ignored for |
fp.size |
length of the fingerprint bit string. This argument is ignored for the |
verbose |
boolean that enables display of detailed molecule pairing and reaction alignment (and respective similarity values). The argument is ignored for |
sort |
boolean than enables |
fpCached |
boolean that enables fingerprint caching. It is set to |
clearCache |
boolean that resets the cache before (and after) processing reaction lists. It is set to |
Details
RxnSim implements four algorithms to compute reaction similarity, namely msim
, msim_max
, rsim
and rsim2
.
msim
is based on individual similarities of molecules in two reactions. First, each reactant (product) of a reaction is paired with an equivalent (similar) reactant (product) of the other reaction based on pairwise similarity values using hierarchical grouping. A
0
similarity value is assigned to each unpaired molecule. Reaction similarity is then computed by averaging the similarity values for each pair of equivalent molecule(s) and unpaired molecule(s). Molecule equivalences computed can be reviewed usingverbose
mode inrs.compute
.msim_max
reaction similarity is computed in the same way as described for
msim
except that the unpaired molecules are not used for computing average.rsim
is based on cumulative features of reactant(s) and product(s) of two reactions. Each reaction is represented by two fingerprints, one each for the reactants and another for products. Reaction similarity is computed by averaging similarity values obtained by comparing reactants fingerprint and products fingerprints.
rsim2
is based on cumulative features of all molecules in a reaction forming a reaction fingerprint. Reaction similarity is computed based on the reaction fingerprints of two reactions.
For reversible reactions (reversible = TRUE
), apart from comparing reactions in the forward direction they are also compared by reversing one of the reactions. The greater of the two similarity values is reported.
Fingerprint Caching
rs.compute
and rs.compute.DB
functions can use fingerprint caching. If fpCached
is set as TRUE
, cache is queried first before generating fingerprints. Any new fingerprint generated is stored in the cache. Setting fpCached = FALSE
makes no change to cache. Cache can be cleared by calling rs.clearCache
.
rs.compute.list
and rs.compute.sim.matrix
functions internally use caching. To ensure consistency of fingerprints, rs.clearCache
is called internally. Use clearCache = FALSE
to override this behaviour; it will use current state of cache and add new fingerprints to it.
Same cache is used for all functions.
Similarity metric included in RxnSim. These metric (except jaccard-count
and tanimoto-count
) are derived from fingerprint pacakge.
ID | Name | Remarks |
simple | Sokal & Michener | bit |
jaccard | Jaccard | bit |
tanimoto | Tanimoto (bit) | bit and count |
jaccard-count | Jaccard (count) | count |
tanimoto-count | Tanimoto (count) | count ^ |
dice | Dice (bit) | bit and count |
russelrao | Russel And Rao | bit |
rodgerstanimoto | Roger And Tanimoto | bit |
achiai | Ochiai | bit |
cosine | Cosine | bit |
kulczynski2 | Kulczynski 2 | bit |
mt | Modified Tanimoto | bit |
baroniurbanibuser | Baroni-Urbani/Buser | bit |
robust | Robust (bit) | bit and count |
tversky | Tversky* | bit |
hamann | Hamann | bit |
pearson | Pearson | bit |
yule | Yule | bit |
mcconnaughey | McConnaughey | bit |
simpson | Simpson | bit |
*Tversky coefficients can be specified by combining them into a vector, e.g., c('tversky', a, b)
.
tanimoto
(bit), dice
(bit) and robust
(bit) compute similarity of feature vectors (count mode) by translating them to equivalent fingerprint vectors. Default similarity metric used is tanimoto
.
List of fingerprints included in RxnSim. These are derived from rCDK package.
ID | Name of the Fingerprint | Mode |
standard | Standard | bit |
extended | Extended | bit |
estate | EState | bit |
graph | Graphonly | bit |
hybridization | Hybridization | bit |
maccs | MACCS | bit |
pubchem | Pubchem | bit |
kr | Klekota-Roth | bit |
shortestpath | Shortestpath | bit |
signature | Signature | count |
circular | Circular | bit and count |
Value
rs.compute |
returns a similarity value. |
rs.compute.list |
returns a |
rs.compute.sim.matrix |
returns a |
rs.compute.DB |
returns a data frame. |
Note
While using fingerprint caching (by setting fpCached = TRUE
in rs.compute
and rs.compute.DB
or clearCache = FALSE
in rs.compute.list
and rs.compute.sim.matrix
), ensure that the fingerprints are generated using same parameters values (fp.type
, fp.mode
, fp.depth
and fp.size
). To reset cache, call rs.clearCache
.
rs.compute.DB
uses same parameter values for creating fingerprint as used for (and stored with) DB object (created using rs.makeDB
) passed as argument.
Author(s)
Varun Giri varungiri@gmail.com
References
^ Carbonell, P., Planson, A-G., Fichera, D., & Faulon J-L. (2011) A retrosynthetic biology approach to metabolic pathway design for therapeutic production. BMC Systems Biology, 5:122.
See Also
rs.makeDB
, rs.clearCache
, ms.compute
Examples
# Reaction similarity using msim algorithm
rs.compute(rct1, rct2, verbose = TRUE)