train_test_split {creditmodel}R Documentation

Train-Test-Split

Description

train_test_split Functions for partition of data.

Usage

train_test_split(
  dat,
  prop = 0.7,
  split_type = "Random",
  occur_time = NULL,
  cut_date = NULL,
  start_date = NULL,
  save_data = FALSE,
  dir_path = tempdir(),
  file_name = NULL,
  note = FALSE,
  seed = 43
)

Arguments

dat

A data.frame with independent variables and target variable.

prop

The percentage of train data samples after the partition.

split_type

Methods for partition.

  • "Random" is to split train & test set randomly.

  • "OOT" is to split by time for observation over time test.

  • "byRow" is to split by rownumbers.

occur_time

The name of the variable that represents the time at which each observation takes place. It is used for "OOT" split.

cut_date

Time points for spliting data sets, e.g. : spliting Actual and Expected data sets.

start_date

The earliest occurrence time of observations.

save_data

Logical, save results in locally specified folder. Default is FALSE.

dir_path

The path for periodically saved data file. Default is "./data".

file_name

The name for periodically saved data file. Default is "dat".

note

Logical. Outputs info. Default is TRUE.

seed

Random number seed. Default is 46.

Value

A list of indices (train-test)

Examples

train_test = train_test_split(lendingclub,
split_type = "OOT", prop = 0.7,
occur_time = "issue_d", seed = 12, save_data = FALSE)
dat_train = train_test$train
dat_test = train_test$test

[Package creditmodel version 1.3.1 Index]