create_statement {sqlscore} | R Documentation |
Generate a CREATE TABLE statement from a model
Description
Generate a CREATE TABLE statement to score the passed model on a preexisting database table. The statement will generate predictions entirely in the database, with no need to fetch data into R. Models need not be GLMs, but their prediction steps must consist of applying a response function to a linear predictor.
Usage
create_statement(mod, dest_table, src_table, dest_schema = NULL,
dest_catalog = NULL, src_schema = NULL, src_catalog = NULL,
drop = FALSE, temporary = FALSE, pk = c("id"), response = NULL,
con = dbplyr::simulate_dbi())
Arguments
mod |
A supported model object. |
dest_table |
The unqualified DB name of the destination table. |
src_table |
The unqualified DB name of the source table. |
dest_schema |
The DB schema of the destination table. |
dest_catalog |
The DB catalog of the destination table. |
src_schema |
The DB schema of the source table. |
src_catalog |
The DB catalog of the source table. |
drop |
Whether to generate a DROP TABLE IF EXISTS before the CREATE TABLE. |
temporary |
Whether the destination table should be a temporary table. |
pk |
A vector of primary key column names. |
response |
The name of a custom response function to apply to the linear predictor. |
con |
A DBI connection to control the details of SQL generation; defaults to dbplyr::simulate_dbi() for the best guess at portable SQL. |
Details
An open database connection can be passed as the 'con' argument, or the ‘dbplyr::simulate_*' functions can be used in applications which don’t have a DB connection when they need to generate SQL.
Value
A dbplyr SQL object representing the SELECT statement.
Supported packages
Specific packages and models that are known to work include: glm and lm from package:stats, cv.glmnet from package:glmnet, glmboost from package:mboost, and bayesglm from package:arm.
Default S3 methods are for objects structured like those of class "glm", so models not listed here may work if they resemble those objects, but are not guaranteed to.
Warning
Note that if the model object transformed its training data before fitting (e.g., centering and scaling predictors), the generated SQL statement will not include those transformations. A future release may include that functionality, but centering and scaling in particular are difficult to do efficiently and portably in SQL.
Examples
# Basic create statements
mod <- glm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species,
data=datasets::iris)
create_statement(mod, src_table="tbl_name", dest_table="target_tbl")
create_statement(mod, src_table="tbl_name", src_schema="schema_name",
src_catalog="catalog_name", dest_table="target_tbl")
create_statement(mod, src_table="tbl_name", src_schema="schema_name",
src_catalog="catalog_name", dest_table="target_tbl",
dest_schema="target_schema", dest_catalog="target_catalog",
pk=c("lab", "specimen_id"))
#With a custom response function
create_statement(mod, src_table="tbl_name", src_schema="schema_name",
dest_table="target_tbl", response="probit")
# With a model-derived non-identity response function
mod <- glm(Sepal.Length > 5.0 ~ Sepal.Width + Petal.Length + Petal.Width + Species,
data=datasets::iris, family=binomial("logit"))
create_statement(mod, src_table="tbl_name", dest_table="target_tbl")
#With formula operators
x <- matrix(rnorm(100*20),100,20)
colnames(x) <- sapply(1:20, function(x) paste0("X", as.character(x)))
x <- as.data.frame(x)
mod <- glm(X2 ~ X3 + X5 + X15*X8, data=x)
create_statement(mod, src_table="tbl_name", dest_table="target_tbl")
create_statement(mod, src_table="tbl_name", dest_table="target_tbl",
response="cauchit")