StartProject {datarobot}R Documentation

Start a project, set the target, and run autopilot.

Description

This function is a convenient shorthand to start a project and set the target. See SetupProject and SetTarget.

Usage

StartProject(
  dataSource,
  projectName = NULL,
  target,
  metric = NULL,
  weights = NULL,
  partition = NULL,
  mode = NULL,
  seed = NULL,
  targetType = NULL,
  positiveClass = NULL,
  blueprintThreshold = NULL,
  responseCap = NULL,
  featurelistId = NULL,
  smartDownsampled = NULL,
  majorityDownsamplingRate = NULL,
  accuracyOptimizedBlueprints = NULL,
  offset = NULL,
  exposure = NULL,
  eventsCount = NULL,
  monotonicIncreasingFeaturelistId = NULL,
  monotonicDecreasingFeaturelistId = NULL,
  onlyIncludeMonotonicBlueprints = FALSE,
  workerCount = NULL,
  wait = FALSE,
  checkInterval = 20,
  timeout = NULL,
  username = NULL,
  password = NULL,
  verbosity = 1,
  maxWait = 600
)

Arguments

dataSource

object. Either (a) the name of a CSV file, (b) a dataframe or (c) url to a publicly available file; in each case, this parameter identifies the source of the data from which all project models will be built. See Details.

projectName

character. Optional. String specifying a project name.

target

character. String giving the name of the response variable to be predicted by all project models.

metric

character. Optional. String specifying the model fitting metric to be optimized; a list of valid options for this parameter, which depends on both project and target, may be obtained with the function GetValidMetrics.

weights

character. Optional. String specifying the name of the column from the modeling dataset to be used as weights in model fitting.

partition

partition. Optional. S3 object of class 'partition' whose elements specify a valid partitioning scheme. See help for functions CreateGroupPartition, CreateRandomPartition, CreateStratifiedPartition, CreateUserPartition and CreateDatetimePartitionSpecification.

mode

character. Optional. Specifies the autopilot mode used to start the modeling project; See AutopilotMode for valid options; AutopilotMode$Quick is default.

seed

integer. Optional. Seed for the random number generator used in creating random partitions for model fitting.

targetType

character. Optional. Used to specify the targetType to use for a project. Valid options are "Binary", "Multiclass", "Regression". Set to "Multiclass" to enable multiclass modeling. Otherwise, it can help to disambiguate, i.e. telling DataRobot how to handle a numeric target with a few unique values that could be used for either multiclass or regression. See TargetType for an easier way to keep track of the options.

positiveClass

character. Optional. Target variable value corresponding to a positive response in binary classification problems.

blueprintThreshold

integer. Optional. The maximum time (in hours) that any modeling blueprint is allowed to run before being excluded from subsequent autopilot stages.

responseCap

numeric. Optional. Floating point value, between 0.5 and 1.0, specifying a capping limit for the response variable. The default value NULL corresponds to an uncapped response, equivalent to responseCap = 1.0.

featurelistId

numeric. Specifies which feature list to use. If NULL (default), a default featurelist is used.

smartDownsampled

logical. Optional. Whether to use smart downsampling to throw away excess rows of the majority class. Only applicable to classification and zero-boosted regression projects.

majorityDownsamplingRate

numeric. Optional. Floating point value, between 0.0 and 100.0. The percentage of the majority rows that should be kept. Specify only if using smart downsampling. May not cause the majority class to become smaller than the minority class.

accuracyOptimizedBlueprints

logical. Optional. When enabled, accuracy optimized blueprints will run in autopilot for the project. These are longer-running model blueprints that provide increased accuracy over normal blueprints that run during autopilot.

offset

character. Optional. Vector of the names of the columns containing the offset of each row.

exposure

character. Optional. The name of a column containing the exposure of each row.

eventsCount

character. Optional. The name of a column specifying the events count.

monotonicIncreasingFeaturelistId

character. Optional. The id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If NULL (default), no such constraints are enforced. When specified, this will set a default for the project that can be overridden at model submission time if desired. The featurelist itself can also be passed as this parameter.

monotonicDecreasingFeaturelistId

character. Optional. The id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If NULL (default), no such constraints are enforced. When specified, this will set a default for the project that can be overridden at model submission time if desired. The featurelist itself can also be passed as this parameter.

onlyIncludeMonotonicBlueprints

logical. Optional. When TRUE, only blueprints that support enforcing monotonic constraints will be available in the project or selected for the autopilot.

workerCount

integer. The number of workers to run (default 2). Use "max" to set to the maximum number of workers available.

wait

logical. If TRUE, invokes WaitForAutopilot to block execution until the autopilot is complete.

checkInterval

numeric. Optional. Maximum wait (in seconds) between checks that Autopilot is finished. Defaults to 20.

timeout

numeric. Optional. Time (in seconds) after which to give up (Default is no timeout). There is an error if Autopilot is not finished before timing out.

username

character. The username to use for authentication to the database.

password

character. The password to use for authentication to the database.

verbosity

numeric. Optional. 0 is silent, 1 or more displays information about progress. Default is 1.

maxWait

integer. Specifies how many seconds to wait for the server to finish analyzing the target and begin the modeling process. If the process takes longer than this parameter specifies, execution will stop (but the server will continue to process the request).

Examples

## Not run: 
  projectId <- "59a5af20c80891534e3c2bde"
  StartProject(iris,
               projectName = "iris",
               target = "Species",
               targetType = TargetType$Multiclass)

## End(Not run)

[Package datarobot version 2.18.6 Index]