civis_ml_sparse_linear_regressor {civis} | R Documentation |
CivisML Sparse Linear Regression
Description
CivisML Sparse Linear Regression
Usage
civis_ml_sparse_linear_regressor(
x,
dependent_variable,
primary_key = NULL,
excluded_columns = NULL,
fit_intercept = TRUE,
normalize = FALSE,
fit_params = NULL,
cross_validation_parameters = NULL,
oos_scores_table = NULL,
oos_scores_db = NULL,
oos_scores_if_exists = c("fail", "append", "drop", "truncate"),
model_name = NULL,
cpu_requested = NULL,
memory_requested = NULL,
disk_requested = NULL,
notifications = NULL,
polling_interval = NULL,
verbose = FALSE,
civisml_version = "prod"
)
Arguments
x |
See the Data Sources section below. |
dependent_variable |
The dependent variable of the training dataset. For a multi-target problem, this should be a vector of column names of dependent variables. Nulls in a single dependent variable will automatically be dropped. |
primary_key |
Optional, the unique ID (primary key) of the training
dataset. This will be used to index the out-of-sample scores. In
|
excluded_columns |
Optional, a vector of columns which will be considered ineligible to be independent variables. |
fit_intercept |
Should an intercept term be included in the model. If
|
normalize |
If |
fit_params |
Optional, a mapping from parameter names in the model's
|
cross_validation_parameters |
Optional, parameter grid for learner
parameters, e.g. |
oos_scores_table |
Optional, if provided, store out-of-sample predictions on training set data to this Redshift "schema.tablename". |
oos_scores_db |
Optional, the name of the database where the
|
oos_scores_if_exists |
Optional, action to take if
|
model_name |
Optional, the prefix of the Platform modeling jobs.
It will have |
cpu_requested |
Optional, the number of CPU shares requested in the Civis Platform for training jobs or prediction child jobs. 1024 shares = 1 CPU. |
memory_requested |
Optional, the memory requested from Civis Platform for training jobs or prediction child jobs, in MiB. |
disk_requested |
Optional, the disk space requested on Civis Platform for training jobs or prediction child jobs, in GB. |
notifications |
Optional, model status notifications. See
|
polling_interval |
Check for job completion every this number of seconds. |
verbose |
Optional, If |
civisml_version |
Optional, a one-length character vector of the CivisML version. The default is "prod", the latest version in production |
Value
A civis_ml
object, a list containing the following elements:
job |
job metadata from |
run |
run metadata from |
outputs |
CivisML metadata from |
metrics |
Parsed CivisML output from
|
model_info |
Parsed CivisML output from
|
Data Sources
For building models with civis_ml
, the training data can reside in
four different places, a file in the Civis Platform, a CSV or feather-format file
on the local disk, a data.frame
resident in local the R environment, and finally,
a table in the Civis Platform. Use the following helpers to specify the
data source when calling civis_ml
:
data.frame
civis_ml(x = df, ...)
- local csv file
civis_ml(x = "path/to/data.csv", ...)
- file in Civis Platform
civis_ml(x = civis_file(1234))
- table in Civis Platform
civis_ml(x = civis_table(table_name = "schema.table", database_name = "database"))
Examples
## Not run:
data(ChickWeight)
m <- civis_ml_sparse_linear_regressor(ChickWeight, dependent_variable = "weight")
yhat <- fetch_oos_scores(m)
# make a prediction job, storing in a redshift table
pred_info <- predict(m, newdata = civis_table("schema.table", "my_database"),
output_table = "schema.scores_table")
## End(Not run)