civis_ml_sparse_logistic {civis} | R Documentation |
CivisML Sparse Logistic
Description
CivisML Sparse Logistic
Usage
civis_ml_sparse_logistic(
x,
dependent_variable,
primary_key = NULL,
excluded_columns = NULL,
penalty = c("l2", "l1"),
dual = FALSE,
tol = 1e-08,
C = 499999950,
fit_intercept = TRUE,
intercept_scaling = 1,
class_weight = NULL,
random_state = 42,
solver = c("liblinear", "newton-cg", "lbfgs", "sag"),
max_iter = 100,
multi_class = c("ovr", "multinomial"),
fit_params = NULL,
cross_validation_parameters = NULL,
calibration = NULL,
oos_scores_table = NULL,
oos_scores_db = NULL,
oos_scores_if_exists = c("fail", "append", "drop", "truncate"),
model_name = NULL,
cpu_requested = NULL,
memory_requested = NULL,
disk_requested = NULL,
notifications = NULL,
polling_interval = NULL,
verbose = FALSE,
civisml_version = "prod"
)
Arguments
x |
See the Data Sources section below. |
dependent_variable |
The dependent variable of the training dataset. For a multi-target problem, this should be a vector of column names of dependent variables. Nulls in a single dependent variable will automatically be dropped. |
primary_key |
Optional, the unique ID (primary key) of the training
dataset. This will be used to index the out-of-sample scores. In
|
excluded_columns |
Optional, a vector of columns which will be considered ineligible to be independent variables. |
penalty |
Used to specify the norm used in the penalization. The
|
dual |
Dual or primal formulation. Dual formulation is only implemented
for |
tol |
Tolerance for stopping criteria. |
C |
Inverse of regularization strength, must be a positive float. Smaller values specify stronger regularization. |
fit_intercept |
Should a constant or intercept term be included in the model. |
intercept_scaling |
Useful only when the |
class_weight |
A
Note, the class weights are multiplied with |
random_state |
The seed of the random number generator to use when
shuffling the data. Used only in |
solver |
Algorithm to use in the optimization problem. For small data
Note that |
max_iter |
The maximum number of iterations taken for the solvers to
converge. Useful for the |
multi_class |
The scheme for multi-class problems. When |
fit_params |
Optional, a mapping from parameter names in the model's
|
cross_validation_parameters |
Optional, parameter grid for learner
parameters, e.g. |
calibration |
Optional, if not |
oos_scores_table |
Optional, if provided, store out-of-sample predictions on training set data to this Redshift "schema.tablename". |
oos_scores_db |
Optional, the name of the database where the
|
oos_scores_if_exists |
Optional, action to take if
|
model_name |
Optional, the prefix of the Platform modeling jobs.
It will have |
cpu_requested |
Optional, the number of CPU shares requested in the Civis Platform for training jobs or prediction child jobs. 1024 shares = 1 CPU. |
memory_requested |
Optional, the memory requested from Civis Platform for training jobs or prediction child jobs, in MiB. |
disk_requested |
Optional, the disk space requested on Civis Platform for training jobs or prediction child jobs, in GB. |
notifications |
Optional, model status notifications. See
|
polling_interval |
Check for job completion every this number of seconds. |
verbose |
Optional, If |
civisml_version |
Optional, a one-length character vector of the CivisML version. The default is "prod", the latest version in production |
Value
A civis_ml
object, a list containing the following elements:
job |
job metadata from |
run |
run metadata from |
outputs |
CivisML metadata from |
metrics |
Parsed CivisML output from
|
model_info |
Parsed CivisML output from
|
Data Sources
For building models with civis_ml
, the training data can reside in
four different places, a file in the Civis Platform, a CSV or feather-format file
on the local disk, a data.frame
resident in local the R environment, and finally,
a table in the Civis Platform. Use the following helpers to specify the
data source when calling civis_ml
:
data.frame
civis_ml(x = df, ...)
- local csv file
civis_ml(x = "path/to/data.csv", ...)
- file in Civis Platform
civis_ml(x = civis_file(1234))
- table in Civis Platform
civis_ml(x = civis_table(table_name = "schema.table", database_name = "database"))
Examples
## Not run:
df <- iris
names(df) <- gsub("\\.", "_", names(df))
m <- civis_ml_sparse_logistic(df, "Species")
yhat <- fetch_oos_scores(m)
# Grid Search
cv_params <- list(C = c(.01, 1, 10, 100, 1000))
m <- civis_ml_sparse_logistic(df, "Species",
cross_validation_parameters = cv_params)
# make a prediction job, storing in a redshift table
pred_info <- predict(m, newdata = civis_table("schema.table", "my_database"),
output_table = "schema.scores_table")
## End(Not run)