taxonomizrSwitch {taxonomizr} | R Documentation |
Switch from data.table to SQLite
Description
In version 0.5.0, taxonomizr switched from data.table to SQLite name and node lookups. See below for more details.
Details
Version 0.5.0 marked a change for name and node lookups from using data.table to using SQLite. This was necessary to increase performance (10-100x speedup for getTaxonomy
) and create a simpler interface (a single SQLite database contains all necessary data). Unfortunately, this switch requires a couple breaking changes:
-
getTaxonomy
changes fromgetTaxonomy(ids,namesDT,nodesDT)
togetTaxonomy(ids,sqlFile)
-
getId
changes fromgetId(taxa,namesDT)
togetId(taxa,sqlFile)
-
read.names
is deprecated, instead useread.names.sql
. For example, instead of callingnames<-read.names('names.dmp')
in every session, simply callread.names.sql('names.dmp','accessionTaxa.sql')
once (or use the convenientprepareDatabase
)). -
read.nodes
is deprecated, instead useread.names.sql
. For example. instead of callingnodes<-read.names('nodes.dmp')
in every session, simply callread.nodes.sql('nodes.dmp','accessionTaxa.sql')
once (or use the convenientprepareDatabase
).
I've tried to ease any problems with this by overloading getTaxonomy
and getId
to still function (with a warning) if passed a data.table names and nodes argument and providing a simpler prepareDatabase
function for completing all setup steps (hopefully avoiding direct calls to read.names
and read.nodes
for most users).
I plan to eventually remove data.table functionality to avoid a split codebase so please switch to the new SQLite format in all new code.
See Also
getTaxonomy
, read.names.sql
, read.nodes.sql
, prepareDatabase
, getId