create_holdout_partition {utiml} | R Documentation |
Create a holdout partition based on the specified algorithm
Description
This method creates multi-label dataset for train, test, validation or other
proposes the partition method defined in method
. The number of
partitions is defined in partitions
parameter. Each instance is used
in only one partition of division.
Usage
create_holdout_partition(
mdata,
partitions = c(train = 0.7, test = 0.3),
method = c("random", "iterative", "stratified")
)
Arguments
mdata |
A mldr dataset. |
partitions |
A list of percentages or a single value. The sum of all
values does not be greater than 1. If a single value is informed then the
complement of them is applied to generated the second partition. If two or
more values are informed and the sum of them is lower than 1 the partitions
will be generated with the informed proportion. If partitions have names,
they are used to name the return. (Default: |
method |
The method to split the data. The default methods are:
You can also create your own partition method. See the note and example sections to more details. (Default: "random") |
Value
A list with at least two datasets sampled as specified in partitions parameter.
Note
To create your own split method, you need to build a function that receive a mldr object and a list with the proportions of examples in each fold and return an other list with the index of the elements for each fold.
References
Sechidis, K., Tsoumakas, G., & Vlahavas, I. (2011). On the stratification of multi-label data. In Proceedings of the Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD (pp. 145-158).
See Also
Other sampling:
create_kfold_partition()
,
create_random_subset()
,
create_subset()
Examples
dataset <- create_holdout_partition(toyml)
names(dataset)
## [1] "train" "test"
#dataset$train
#dataset$test
dataset <- create_holdout_partition(toyml, c(a=0.1, b=0.2, c=0.3, d=0.4))
#' names(dataset)
#' ## [1] "a" "b" "c" "d"
sequencial_split <- function (mdata, r) {
S <- list()
amount <- trunc(r * mdata$measures$num.instances)
indexes <- c(0, cumsum(amount))
indexes[length(r)+1] <- mdata$measures$num.instances
S <- lapply(seq(length(r)), function (i) {
seq(indexes[i]+1, indexes[i+1])
})
S
}
dataset <- create_holdout_partition(toyml, method="sequencial_split")