R: Dataset splitting

train_test_split {less}

R Documentation

Dataset splitting

Description

Split dataframes or matrices into random train and test subsets. Takes the column at the y_index of data as response variable (y) and the rest as the independent variables (X)

Usage

train_test_split(
  data,
  test_size = 0.3,
  random_state = NULL,
  y_index = ncol(data)
)

Arguments

`data`	Dataset that is going to be split
`test_size`	Represents the proportion of the dataset to include in the test split. Should be between 0.0 and 1.0 (defaults to 0.3)
`random_state`	Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls (defaults to NULL)
`y_index`	Corresponding column index of the response variable y (defaults to last column of data)

Value

A list of length 4 with elements:

`X_train`	Training input variables

`X_test`	Test input variables

`y_train`	Training response variables

`y_test`	Test response variables

Examples

data(abalone)
split_list <- train_test_split(abalone, test_size =  0.3)
X_train <- split_list[[1]]
X_test <- split_list[[2]]
y_train <- split_list[[3]]
y_test <- split_list[[4]]

print(head(X_train))
print(head(X_test))
print(head(y_train))
print(head(y_test))

[Package less version 0.1.0 Index]