cdk.lf, moe.lf, bci.lf {fingerprint} | R Documentation |
Functions to parse lines from fingerprint files
Description
These functions take a single line and parses it to produce
a vector of integers which represents the position of the 'on' bits in
a fingerprint. This allows the user to use read.fp
with arbitrary fingerprint
files. A new file format can be handled by defining a new line parser function.
Currently the first three functions process fingerprint files obtained from the
CDK (http://cdk.sourceforge.net), MOE (http://chemcomp.com), BCI
(http://www.digitalchemistry.co.uk/) and the FPS format
(http://code.google.com/p/chem-fingerprints/wiki/FPS). The last function can be used
for any fingerprint that generates hashed features (such as ECFPs or other
circular fingerprints). For these cases, it is assumed that features are unsigned
integers, so string features are not handled.
Note that when the fps.lf
function is specified, items such as the number of bits
or the header flag do not need to be specified, as the format requires a header block
containing some of these items.
Usage
cdk.lf(line)
moe.lf(line)
bci.lf(line)
ecfp.lf(line)
fps.lf(line)
jchem.binary.lf(line)
Arguments
line |
The line to parse |
Value
A list with three componenents - the name associated with the fingerprint (if available) and a vector of integers representing bits set to 1 (for the case of the first three methods) or a vector of characters representing hashed features (characteristic of circular fingerprints) or more generally, any string feature. The third component is a (possibly empty) list, which contains the remaining components of a line, when the format allows items other than an a title and the fingerprint (such as the FPS format). The content of the third component is dependent on the line function that is being used.
Author(s)
Rajarshi Guha rajarshi.guha@gmail.com