booster2sql {xgb2sql}R Documentation

Transform XGBoost model object to SQL query.

Description

This function generates SQL query for in-database scoring of XGBoost models, providing a robust and efficient way of model deployment. It takes in the trained XGBoost model xgbModel, name of the input database table input_table_name, and name of a unique identifier within that table unique_id as input, writes the SQL query to a file specified by output_file_name. Note that the input database table should be generated from the raw table using the one-hot encoding query output by onehot2sql(), or to provide the one-hot encoding query as input input_onehot_query to this function, working as sub-query inside the final model scoring query.

Usage

booster2sql(xgbModel, print_progress = FALSE, unique_id = NULL,
  output_file_name = NULL, input_table_name = NULL,
  input_onehot_query = NULL)

Arguments

xgbModel

The trained model object of class xgb.Booster. Current supported booster is booster="gbtree", supported objective options are:

  • reg:linear: linear regression.

  • reg:logistic: logistic regression.

  • - binary:logistic: logistic regression for binary classification, output probability.

  • binary:logitraw: logistic regression for binary classification, output score before logistic transformation.

  • - binary:hinge: hinge loss for binary classification. This makes predictions of 0 or 1, rather than producing probabilities.

  • - count:poisson: poisson regression for count data, output mean of poisson distribution.

  • - reg:gamma: gamma regression with log-link, output mean of gamma distribution. It might be useful, e.g., for modeling insurance claims severity, or for any outcome that might be gamma-distributed.

  • - reg:tweedie: Tweedie regression with log-link. It might be useful, e.g., for modeling total loss in insurance, or for any outcome that might be Tweedie-distributed.

print_progress

Boolean indicator controls whether the SQL generating progress should be printed to console.

unique_id

A row unique identifier is crucial for in-database scoring of XGBoost model. If not given, SQL query will be generated with id name "ROW_KEY".

output_file_name

File name that the SQL syntax will write to. It must not be empty in order for this function to run.

input_table_name

Name of raw data table in the database, that the SQL query will select from. If not given, SQL query will be generated with table name "MODREADY_TABLE".

input_onehot_query

SQL query of one-hot encoding generated by onehot2sql. When input_table_name is empty while input_onehot_query is not, the final output query will include input_onehot_query as sub-query.

Value

The SQL query will write to the file specified by output_file_name.

Examples

library(xgboost)
# load data
df = data.frame(ggplot2::diamonds)
head(df)

# data processing
out <- onehot2sql(df)
x <- out$model.matrix[,colnames(out$model.matrix)!='price']
y <- out$model.matrix[,colnames(out$model.matrix)=='price']

# model training
bst <- xgboost(data = x,
               label = y,
               max.depth = 3,
               eta = .3,
               nround = 5,
               nthread = 1,
               objective = 'reg:linear')

# generate model scoring SQL script with ROW_KEY and MODREADY_TABLE
booster2sql(bst, output_file_name='xgb.txt')

[Package xgb2sql version 0.1.2 Index]