split_data {AutoScore}R Documentation

AutoScore Function: Automatically splitting dataset to train, validation and test set, possibly stratified by label

Description

AutoScore Function: Automatically splitting dataset to train, validation and test set, possibly stratified by label

Usage

split_data(data, ratio, cross_validation = FALSE, strat_by_label = FALSE)

Arguments

data

The dataset to be split

ratio

The ratio for dividing dataset into training, validation and testing set. (Default: c(0.7, 0.1, 0.2))

cross_validation

If set to TRUE, cross-validation would be used for generating parsimony plot, which is suitable for small-size data. Default to FALSE

strat_by_label

If set to TRUE, data splitting is stratified on the outcome variable. Default to FALSE

Value

Returns a list containing training, validation and testing set

Examples

data("sample_data")
names(sample_data)[names(sample_data) == "Mortality_inpatient"] <- "label"
set.seed(4)
#large sample size
out_split <- split_data(data = sample_data, ratio = c(0.7, 0.1, 0.2))
#small sample size
out_split <- split_data(data = sample_data, ratio = c(0.7, 0, 0.3),
                        cross_validation = TRUE)
#large sample size, stratified
out_split <- split_data(data = sample_data, ratio = c(0.7, 0.1, 0.2),
                        strat_by_label = TRUE)

[Package AutoScore version 1.0.0 Index]