rSplit {Qindex} | R Documentation |
Random Split Sampling with Stratification
Description
Random split sampling, stratified based on the type of the response.
Usage
rSplit(y, nsplit, stratified = TRUE, trainFrac = 0.8, ...)
Arguments
y |
a double vector,
a logical vector,
a factor,
or a Surv object,
response |
nsplit |
positive integer scalar, replicates of random splits to be performed |
stratified |
logical scalar,
whether stratification based on response |
trainFrac |
double scalar between 0 and 1,
fraction of the training set, default |
... |
additional parameters, currently not in use |
Details
Function rSplit()
performs random split sampling,
with or without stratification. Specifically,
If
stratified = FALSE
, or if we have a double responsey
, then split the sample into a training and a test set by ratiotrainFrac
, without stratification.Otherwise, split a Surv response
y
, stratified by its censoring status. Specifically, split subjects with observed event into a training and a test set with training set fractiontrainFrac
, and split the censored subjects into a training and a test set with training set fractiontrainFrac
. Then combine the training sets from subjects with observed events and censored subjects, and combine the test sets from subjects with observed events and censored subjects.Otherwise, split a logical response
y
, stratified by itself. Specifically, split the subjects withTRUE
response into a training and a test set with training set fractiontrainFrac
, and split the subjects withFALSE
response into a training and a test set with training set fractiontrainFrac
. Then combine the training sets, and the test sets, in a similar fashion as described above.Otherwise, split a factor response
y
, stratified by its levels. Specifically, split the subjects in each level ofy
into a training and a test set by ratiotrainFrac
. Then combine the training sets, and the test sets, from all levels ofy
.
Value
Function rSplit()
returns a length-nsplit
list of
logical vectors.
In each logical vector,
the TRUE
elements indicate training subjects and
the FALSE
elements indicate test subjects.
Note
caTools::sample.split()
is not what we need.
See Also
Examples
rSplit(y = rep(c(TRUE, FALSE), times = c(20, 30)), nsplit = 3L)