SplitData {DevTreatRules} | R Documentation |
Partition a dataset into independent subsets
Description
To get a trustworthy estimate of how a developed treatment rule will perform in independent samples drawn from the same population, it is critical that rule development be performed independently of rule evaluation. Further, it is common to perform model selection to settle on the form of the developed treatment rule and, in this case, it is essential that the ultimately chosen treatment rule is also evaluated on data that did not inform any stage of the model-building. The SplitData()
function partitions a dataset so rule development/validation/evaluation (or development/evaluation if there is no model selection) can quickly be performed on independent datasets. This function is only appropriate for the simple setting where the rows in a given dataset are independent of one another (e.g. the same individuals are not represented with multiple rows).
Usage
SplitData(data, n.sets = c(3, 2), split.proportions = NULL)
Arguments
data |
A data frame representing the *development* dataset used for building a treatment rule |
n.sets |
A numeric/integer equal to either 3 (if a development/validation/evaluation partition is desired) or 2 (if there is no model-selection and only a development/evaluation partition is desired). |
split.proportions |
A numeric vector with length equal to |
Value
A data.frame equal to data
with an additional column named ‘partition’, which is a factor variable with levels equal to ‘development’ and ‘evaluation’ (if n.sets=2
) or to ‘development’, ‘validation’, and ‘evaluation’ (if n.sets=3
).
Examples
set.seed(123)
example.split <- SplitData(data=obsStudyGeneExpressions,
n.sets=3, split.proportions=c(0.5, 0.25, 0.25))
table(example.split$partition)