split_data {GeneSelectR} | R Documentation |
Split Data into Training and Test Sets
Description
Split Data into Training and Test Sets
Usage
split_data(X, y, test_size, modules)
Arguments
X |
A dataframe or matrix of predictors. |
y |
A vector of outcomes. |
test_size |
Proportion of the data to be used as the test set. |
modules |
A list containing the definitions for the Python modules and submodules. |
Value
A list containing the split datasets:
@field X_train: Training set for predictors, converted to Python format.
@field X_test: Test set for predictors, converted to Python format.
@field y_train: Training set for outcomes, converted to Python format.
@field y_test: Test set for outcomes, converted to Python format. The function ensures that the data is appropriately partitioned and formatted for use in Python-based analysis.
Examples
# Assuming 'data' is your dataset with predictors and 'outcome' is the target variable
# Define sklearn modules (assuming 'define_sklearn_modules' is defined)
sklearn_modules <- define_sklearn_modules()
# Split the data into training and test sets
split_results <- split_data(data, outcome, test_size = 0.2, modules = sklearn_modules)
[Package GeneSelectR version 1.0.1 Index]