splitData {icardaFIGSr} | R Documentation |
Splitting Data
Description
this function splits the Data into Train and Test Sets, it returns a list containing two data frames for the train and test sets.
Usage
splitData(data, seed = NULL, y, p, ...)
Arguments
data |
object of class "data.frame" with target variable and predictor variables. |
seed |
integer. Values for the random number generator. Default: NULL. |
y |
character. Target variable. |
p |
numeric. Proportion of data to be used for training. |
... |
additional arguments to be passed to |
Details
splitData
relies on the createDataPartition
function from the caret
package to perform the data split.
If y
is a factor, the sampling of observations for each set is done within the levels of y
such that the class distributions are more or less balanced for each set.
If y
is numeric, the data is split into groups based on percentiles and the sampling done within these subgroups. See createDataPartition
for more details on additional arguments that can be passed.
Value
A list with two data frames: the first as train set, and the second as test set.
Author(s)
Zakaria Kehel, Bancy Ngatia
See Also
Examples
if(interactive()){
# Split the data into 70/30 train and test sets for factor y
data(septoriaDurumWC)
split.data <- splitData(septoriaDurumWC, seed = 1234,
y = 'ST_S', p = 0.7)
# Get training and test sets from list object returned
trainset <- split.data$trainset
testset <- split.data$testset
}