R: Split Data into Training and Test Sets

split_data {GeneSelectR}

R Documentation

Split Data into Training and Test Sets

Description

Split Data into Training and Test Sets

Usage

split_data(X, y, test_size, modules)

Arguments

`X`	A dataframe or matrix of predictors.
`y`	A vector of outcomes.
`test_size`	Proportion of the data to be used as the test set.
`modules`	A list containing the definitions for the Python modules and submodules.

Value

A list containing the split datasets:

@field X_train: Training set for predictors, converted to Python format.
@field X_test: Test set for predictors, converted to Python format.
@field y_train: Training set for outcomes, converted to Python format.
@field y_test: Test set for outcomes, converted to Python format. The function ensures that the data is appropriately partitioned and formatted for use in Python-based analysis.

Examples


# Assuming 'data' is your dataset with predictors and 'outcome' is the target variable
# Define sklearn modules (assuming 'define_sklearn_modules' is defined)
sklearn_modules <- define_sklearn_modules()

# Split the data into training and test sets
split_results <- split_data(data, outcome, test_size = 0.2, modules = sklearn_modules)

[Package GeneSelectR version 1.0.1 Index]