splitDataset {gecko}R Documentation

Split a dataset for model training

Description

Split a dataset for model training while keeping class representativity.

Usage

splitDataset(data, proportion)

Arguments

data

dataframe. Containg some sort of classification data. The last column must contain the label data.

proportion

numeric. A value between 0 a 1 determining the proportion of the dataset split between training and testing.

Value

list. First element is the train data, second element is the test data.

Examples

# Binary label case
my_data = data.frame(X = runif(20), Y = runif(20), Z = runif(20), Label =
c(rep("presence", 10), rep("outlier", 10)) )
splitDataset(my_data, 0.8)

# Multi label case
my_data = data.frame(X = runif(60), Y = runif(60), Z = runif(60), Label =
c(rep("A", 20), rep("B", 30), rep("C", 10)) )
splitDataset(my_data, 0.8)

[Package gecko version 1.0.0 Index]