R: Manually Partition into Training and Test Set

partition {mlr3}

R Documentation

Manually Partition into Training and Test Set

Description

Creates a split of the row ids of a Task into a training set and a test set while optionally stratifying on the target column.

For more complex partitions, see the example.

Usage

partition(task, ratio = 0.67, stratify = TRUE, ...)

## S3 method for class 'TaskRegr'
partition(task, ratio = 0.67, stratify = TRUE, bins = 3L, ...)

## S3 method for class 'TaskClassif'
partition(task, ratio = 0.67, stratify = TRUE, ...)

Arguments

`task`	(Task) Task to operate on.
`ratio`	(`numeric(1)`) Ratio of observations to put into the training set.
`stratify`	(`logical(1)`) If `TRUE`, stratify on the target variable. For regression tasks, the target variable is first cut into `bins` bins. See `Task$add_strata()`.
`...`	(any) Additional arguments, currently not used.
`bins`	(`integer(1)`) Number of bins to cut the target variable into for stratification.

Examples

# regression task
task = tsk("boston_housing")

# roughly equal size split while stratifying on the binned response
split = partition(task, ratio = 0.5)
data = data.frame(
  y = c(task$truth(split$train), task$truth(split$test)),
  split = rep(c("train", "predict"), lengths(split))
)
boxplot(y ~ split, data = data)


# classification task
task = tsk("pima")
split = partition(task)

# roughly same distribution of the target label
prop.table(table(task$truth()))
prop.table(table(task$truth(split$train)))
prop.table(table(task$truth(split$test)))


# splitting into 3 disjunct sets, using ResamplingCV and stratification
task = tsk("iris")
task$set_col_roles(task$target_names, add_to = "stratum")
r = rsmp("cv", folds = 3)$instantiate(task)

sets = lapply(1:3, r$train_set)
lengths(sets)
prop.table(table(task$truth(sets[[1]])))

[Package mlr3 version 0.20.2 Index]