R: Splits Dataset into Train and Test Datasets

split-methods {rebmix}

R Documentation

Splits Dataset into Train and Test Datasets

Description

Returns (invisibly) the object containing train and test observations \bm{y}_{1}, \ldots, \bm{y}_{n} as well as true class membership \bm{\Omega}_{g} for the test dataset.

Usage

## S4 method for signature 'numeric'
split(p = 0.75, Dataset = data.frame(), class = numeric(), ...)
## S4 method for signature 'list'
split(p = list(), Dataset = data.frame(), class = numeric(), ...)
## ... and for other signatures

Arguments

`p`	see Methods section below.
`Dataset`	a data frame containing dataset `Y` of length `n`. For the dataset the corresponding class membership `\bm{\Omega}_{g}` is known. The default value is `data.frame()`.
`class`	a column number in `Dataset` containing the class membership information. The default value is `numeric()`.
`...`	further arguments to `sample`.

Value

Returns an object of class RCLS.chunk.

Methods

signature(p = "numeric"): a number specifying the fraction of observations for training 0.0 \leq p \leq 1.0. The default value is 0.75.
signature(p = "list"): a list composed of column number p$type in Dataset containing the type membership information followed by the corresponding train p$train and test p$test values. The default value is list().

Author(s)

Marko Nagode

Examples

## Not run: 
data(iris)

# Split dataset into train (75

set.seed(5)

Iris <- split(p = 0.75, Dataset = iris, class = 5)

Iris

# Generate simulated dataset.

N <- 1000

class <- c(rep("A", 0.4 * N), rep("B", 0.2 * N),
  rep("C", 0.1 * N), rep("D", 0.05 * N), rep("E", 0.25 * N))

type <- c(rep("train", 0.75 * N), rep("test", 0.25 * N))

n <- 300

Dataset <- data.frame(1:n, sample(class, n))

colnames(Dataset) <- c("y", "class")

# Split dataset into train (60

simulated <- split(p = 0.6, Dataset = Dataset, class = 2)

simulated

# Generate simulated dataset.

Dataset <- data.frame(1:n, sample(class, n), sample(type, n))

colnames(Dataset) <- c("y", "class", "type")

# Split dataset into train and test subsets.

simulated <- split(p = list(type = 3, train = "train",
  test = "test"), Dataset = Dataset, class = 2)

simulated

## End(Not run)

[Package rebmix version 2.16.0 Index]