R: trainsplit

trainsplit {trainsplit}

R Documentation

trainsplit

Description

Splits a dataframe, tibble, or data.table into a test set and training set. Specify either the number or percentage of observations to be put into training set.

Usage

trainsplit(
  data,
  ntrain = NULL,
  trainpct = NULL,
  round_ntrain = "round",
  seed = NULL,
  return = "parentenv"
)

Arguments

`data`	The dataset you want to split
`ntrain`	The number of observations to go into the training set. Must be >= 0 and <= nrow(data).
`trainpct`	Fraction of observations to go into training set. Must be >= 0 and =< 1. If set to 0 or 1, the empty test or training set will still inherit the same column names and types as the original dataset.
`round_ntrain`	What to do when nrow(data) * trainpct is not a whole number. Default behavior is to round the size of the training set. Use 'ceiling' or 'floor' to instead set the size of training set to next highest or lowest whole number.
`seed`	Sets the random seed; use this argument if you want to always get the same result. Note: sets seed only locally within the function.
`return`	Three return modes available: "parentenv" assigns the training and test sets into the environment that called the function with names based on the name of the original dataset; this is intended largely for an educational context. "list" will return a list with the training and test sets. "index" will return only the numerical index of the rows to be placed into the training set, which can then be manually subset by the user.

Value

Depends on "return" argument; either a list, an index, or NULL if return = "parentenv" was selected.

Examples

# Splits the training and test sets and assigns them into memory.
trainsplit(mtcars, trainpct = 0.75)
# Specify size of training set by number of rows, not percent:
trainsplit(mtcars, ntrain = 10)
# Size of training set rounds to one:
trainsplit(mtcars, trainpct = 0.01, round_ntrain = 'ceiling')
# Also works with data.table:
trainsplit(data.table::as.data.table(mtcars), trainpct = 0.75)
# Return a list containing the training/test sets instead:
trainsplit(mtcars, trainpct = 0.75, return = 'list')

[Package trainsplit version 1.1 Index]