splitData {icardaFIGSr}R Documentation

Splitting Data

Description

this function splits the Data into Train and Test Sets, it returns a list containing two data frames for the train and test sets.

Usage

splitData(data, seed = NULL, y, p, ...)

Arguments

data

object of class "data.frame" with target variable and predictor variables.

seed

integer. Values for the random number generator. Default: NULL.

y

character. Target variable.

p

numeric. Proportion of data to be used for training.

...

additional arguments to be passed to createDataPartition function in caret package to control the way the data is split.

Details

splitData relies on the createDataPartition function from the caret package to perform the data split.

If y is a factor, the sampling of observations for each set is done within the levels of y such that the class distributions are more or less balanced for each set.

If y is numeric, the data is split into groups based on percentiles and the sampling done within these subgroups. See createDataPartition for more details on additional arguments that can be passed.

Value

A list with two data frames: the first as train set, and the second as test set.

Author(s)

Zakaria Kehel, Bancy Ngatia

See Also

createDataPartition

Examples

if(interactive()){
 # Split the data into 70/30 train and test sets for factor y
 data(septoriaDurumWC)
 split.data <- splitData(septoriaDurumWC, seed = 1234,
                         y = 'ST_S', p = 0.7)

 # Get training and test sets from list object returned
 trainset <- split.data$trainset
 testset <- split.data$testset
}

[Package icardaFIGSr version 1.0.2 Index]