civis_ml_sparse_ridge_regressor {civis} | R Documentation |
CivisML Sparse Ridge Regression
Description
CivisML Sparse Ridge Regression
Usage
civis_ml_sparse_ridge_regressor(
x,
dependent_variable,
primary_key = NULL,
excluded_columns = NULL,
alpha = 1,
fit_intercept = TRUE,
normalize = FALSE,
max_iter = NULL,
tol = 0.001,
solver = c("auto", "svd", "cholesky", "lsqr", "sparse_cg", "sag"),
random_state = 42,
fit_params = NULL,
cross_validation_parameters = NULL,
oos_scores_table = NULL,
oos_scores_db = NULL,
oos_scores_if_exists = c("fail", "append", "drop", "truncate"),
model_name = NULL,
cpu_requested = NULL,
memory_requested = NULL,
disk_requested = NULL,
notifications = NULL,
polling_interval = NULL,
verbose = FALSE,
civisml_version = "prod"
)
Arguments
x |
See the Data Sources section below. |
dependent_variable |
The dependent variable of the training dataset. For a multi-target problem, this should be a vector of column names of dependent variables. Nulls in a single dependent variable will automatically be dropped. |
primary_key |
Optional, the unique ID (primary key) of the training
dataset. This will be used to index the out-of-sample scores. In
|
excluded_columns |
Optional, a vector of columns which will be considered ineligible to be independent variables. |
alpha |
The regularization strength, must be a vector of floats of length n_targets or a single float. Larger values specify stronger regularization. |
fit_intercept |
Should an intercept term be included in the model. If
|
normalize |
If |
max_iter |
Maximum number of iterations for conjugate gradient solver.
For |
tol |
Precision of the solution. |
solver |
Solver to use for the optimization problem.
|
random_state |
The seed of the pseudo random number generator to use
when shuffling the data. Used only when |
fit_params |
Optional, a mapping from parameter names in the model's
|
cross_validation_parameters |
Optional, parameter grid for learner
parameters, e.g. |
oos_scores_table |
Optional, if provided, store out-of-sample predictions on training set data to this Redshift "schema.tablename". |
oos_scores_db |
Optional, the name of the database where the
|
oos_scores_if_exists |
Optional, action to take if
|
model_name |
Optional, the prefix of the Platform modeling jobs.
It will have |
cpu_requested |
Optional, the number of CPU shares requested in the Civis Platform for training jobs or prediction child jobs. 1024 shares = 1 CPU. |
memory_requested |
Optional, the memory requested from Civis Platform for training jobs or prediction child jobs, in MiB. |
disk_requested |
Optional, the disk space requested on Civis Platform for training jobs or prediction child jobs, in GB. |
notifications |
Optional, model status notifications. See
|
polling_interval |
Check for job completion every this number of seconds. |
verbose |
Optional, If |
civisml_version |
Optional, a one-length character vector of the CivisML version. The default is "prod", the latest version in production |
Value
A civis_ml
object, a list containing the following elements:
job |
job metadata from |
run |
run metadata from |
outputs |
CivisML metadata from |
metrics |
Parsed CivisML output from
|
model_info |
Parsed CivisML output from
|
Data Sources
For building models with civis_ml
, the training data can reside in
four different places, a file in the Civis Platform, a CSV or feather-format file
on the local disk, a data.frame
resident in local the R environment, and finally,
a table in the Civis Platform. Use the following helpers to specify the
data source when calling civis_ml
:
data.frame
civis_ml(x = df, ...)
- local csv file
civis_ml(x = "path/to/data.csv", ...)
- file in Civis Platform
civis_ml(x = civis_file(1234))
- table in Civis Platform
civis_ml(x = civis_table(table_name = "schema.table", database_name = "database"))
Examples
## Not run:
data(ChickWeight)
m <- civis_ml_sparse_ridge_regressor(ChickWeight, dependent_variable = "weight", alpha = 999)
yhat <- fetch_oos_scores(m)
# Grid search
cv_params <- list(alpha = c(.001, .01, .1, 1))
m <- civis_ml_sparse_ridge_regressor(ChickWeight,
dependent_variable = "weight",
cross_validation_parameters = cv_params,
calibration = NULL)
# make a prediction job, storing in a redshift table
pred_info <- predict(m, newdata = civis_table("schema.table", "my_database"),
output_table = "schema.scores_table")
## End(Not run)