isotree.to.sql {isotree}R Documentation

Generate SQL statements from Isolation Forest model

Description

Generate SQL statements - either separately per tree (the default), for a single tree if needed (if passing 'tree'), or for all trees concatenated together (if passing 'table_from'). Can also be made to output terminal node numbers (numeration starting at one).

Some important considerations:

Usage

isotree.to.sql(
  model,
  enclose = "doublequotes",
  output_tree_num = FALSE,
  tree = NULL,
  table_from = NULL,
  select_as = "outlier_score",
  column_names = NULL,
  column_names_categ = NULL,
  nthreads = model$nthreads
)

Arguments

model

An Isolation Forest object as returned by isolation.forest.

enclose

With which symbols to enclose the column names in the select statement so as to make them SQL compatible in case they include characters like dots. Options are:

  • '"doublequotes"', which will enclose them as '"column_name"' - this will work for e.g. PostgreSQL.

  • '"squarebraces"', which will enclose them as '[column_name]' - this will work for e.g. SQL Server.

  • '"none"', which will output the column names as-is (e.g. 'column_name')

output_tree_num

Whether to make the statements / outputs return the terminal node number instead of the isolation depth. The numeration will start at one.

tree

Tree for which to generate SQL statements or other outputs. If passed, will generate the statements only for that single tree. If passing 'NULL', will generate statements for all trees in the model.

table_from

If passing this, will generate a single select statement for the outlier score from all trees, selecting the data from the table name passed here. In this case, will always output the outlier score, regardless of what is passed under 'output_tree_num'.

select_as

Alias to give to the generated outlier score in the select statement. Ignored when not passing 'table_from'.

column_names

Column names to use for the numeric columns. If not passed and the model was fit to a 'data.frame', will use the column names from that 'data.frame', which can be found under 'model$metadata$cols_num'. If not passing it and the model was fit to data in a format other than 'data.frame', the columns will be named 'column_N' in the resulting SQL statement. Note that the names will be taken verbatim - this function will not do any checks for e.g. whether they constitute valid SQL or not when exporting to SQL, and will not escape characters such as double quotation marks when exporting to SQL.

column_names_categ

Column names to use for the categorical columns. If not passed, will use the column names from the 'data.frame' to which the model was fit. These can be found under 'model$metadata$cols_cat'.

nthreads

Number of parallel threads to use.

Value

Examples

library(isotree)
data(iris)
set.seed(1)
iso <- isolation.forest(iris, ntrees=2, sample_size=16, ndim=1, nthreads=1)
sql_forest <- isotree.to.sql(iso, table_from="my_iris_table")
cat(sql_forest)

[Package isotree version 0.6.1-1 Index]