R: Partition a dataset into independent subsets

SplitData {DevTreatRules}

R Documentation

Partition a dataset into independent subsets

Description

To get a trustworthy estimate of how a developed treatment rule will perform in independent samples drawn from the same population, it is critical that rule development be performed independently of rule evaluation. Further, it is common to perform model selection to settle on the form of the developed treatment rule and, in this case, it is essential that the ultimately chosen treatment rule is also evaluated on data that did not inform any stage of the model-building. The SplitData() function partitions a dataset so rule development/validation/evaluation (or development/evaluation if there is no model selection) can quickly be performed on independent datasets. This function is only appropriate for the simple setting where the rows in a given dataset are independent of one another (e.g. the same individuals are not represented with multiple rows).

Usage

SplitData(data, n.sets = c(3, 2), split.proportions = NULL)

Arguments

`data`	A data frame representing the development dataset used for building a treatment rule
`n.sets`	A numeric/integer equal to either 3 (if a development/validation/evaluation partition is desired) or 2 (if there is no model-selection and only a development/evaluation partition is desired).
`split.proportions`	A numeric vector with length equal to `n.sets`, providing the proportion of observations in `data` that should be assigned to the development/evaluation partitions (if `n.sets=2`) or to the development/validation/evaluation partitions (if `n.sets=3`). The entries must sum to 1.

Value

A data.frame equal to data with an additional column named ‘partition’, which is a factor variable with levels equal to ‘development’ and ‘evaluation’ (if n.sets=2) or to ‘development’, ‘validation’, and ‘evaluation’ (if n.sets=3).

Examples

set.seed(123)
example.split <- SplitData(data=obsStudyGeneExpressions,
                                     n.sets=3, split.proportions=c(0.5, 0.25, 0.25))
table(example.split$partition)

[Package DevTreatRules version 1.1.0 Index]