create_statement {sqlscore}R Documentation

Generate a CREATE TABLE statement from a model

Description

Generate a CREATE TABLE statement to score the passed model on a preexisting database table. The statement will generate predictions entirely in the database, with no need to fetch data into R. Models need not be GLMs, but their prediction steps must consist of applying a response function to a linear predictor.

Usage

create_statement(mod, dest_table, src_table, dest_schema = NULL,
  dest_catalog = NULL, src_schema = NULL, src_catalog = NULL,
  drop = FALSE, temporary = FALSE, pk = c("id"), response = NULL,
  con = dbplyr::simulate_dbi())

Arguments

mod

A supported model object.

dest_table

The unqualified DB name of the destination table.

src_table

The unqualified DB name of the source table.

dest_schema

The DB schema of the destination table.

dest_catalog

The DB catalog of the destination table.

src_schema

The DB schema of the source table.

src_catalog

The DB catalog of the source table.

drop

Whether to generate a DROP TABLE IF EXISTS before the CREATE TABLE.

temporary

Whether the destination table should be a temporary table.

pk

A vector of primary key column names.

response

The name of a custom response function to apply to the linear predictor.

con

A DBI connection to control the details of SQL generation; defaults to dbplyr::simulate_dbi() for the best guess at portable SQL.

Details

An open database connection can be passed as the 'con' argument, or the ‘dbplyr::simulate_*' functions can be used in applications which don’t have a DB connection when they need to generate SQL.

Value

A dbplyr SQL object representing the SELECT statement.

Supported packages

Specific packages and models that are known to work include: glm and lm from package:stats, cv.glmnet from package:glmnet, glmboost from package:mboost, and bayesglm from package:arm.

Default S3 methods are for objects structured like those of class "glm", so models not listed here may work if they resemble those objects, but are not guaranteed to.

Warning

Note that if the model object transformed its training data before fitting (e.g., centering and scaling predictors), the generated SQL statement will not include those transformations. A future release may include that functionality, but centering and scaling in particular are difficult to do efficiently and portably in SQL.

Examples

# Basic create statements
mod <- glm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species,
           data=datasets::iris)
create_statement(mod, src_table="tbl_name", dest_table="target_tbl")
create_statement(mod, src_table="tbl_name", src_schema="schema_name",
                 src_catalog="catalog_name", dest_table="target_tbl")
create_statement(mod, src_table="tbl_name", src_schema="schema_name",
                 src_catalog="catalog_name", dest_table="target_tbl",
                 dest_schema="target_schema", dest_catalog="target_catalog",
                 pk=c("lab", "specimen_id"))

#With a custom response function
create_statement(mod, src_table="tbl_name", src_schema="schema_name",
                 dest_table="target_tbl", response="probit")

# With a model-derived non-identity response function
mod <- glm(Sepal.Length > 5.0 ~ Sepal.Width + Petal.Length + Petal.Width + Species,
           data=datasets::iris, family=binomial("logit"))
create_statement(mod, src_table="tbl_name", dest_table="target_tbl")

#With formula operators
x <- matrix(rnorm(100*20),100,20)
colnames(x) <- sapply(1:20, function(x) paste0("X", as.character(x)))
x <- as.data.frame(x)
mod <- glm(X2 ~ X3 + X5 + X15*X8, data=x)
create_statement(mod, src_table="tbl_name", dest_table="target_tbl")
create_statement(mod, src_table="tbl_name", dest_table="target_tbl",
                 response="cauchit")


[Package sqlscore version 0.1.4 Index]