inTrainingSample {nlcv} | R Documentation |
Function to define a learning sample based on balanced sampling
Description
This function takes in a factor with class labels of the total dataset,
draws a sample (balanced with respect to the different levels of the factor)
and returns a logical vector indicating whether the observation is in the
learning sample (TRUE
) or not (FALSE
).
Usage
inTrainingSample(y, propTraining = 2/3, classdist = c("balanced",
"unbalanced"))
Arguments
y |
factor with the class labels for the total data set |
propTraining |
proportion of the data that should be in a training set; the default value is 2/3. |
classdist |
distribution of classes; allows to indicate whether your distribution 'balanced' or 'unbalanced'. The sampling strategy for each run is adapted accordingly. |
Value
logical vector indicating for each observation in y
whether
the observation is in the learning sample (TRUE
) or not
(FALSE
)
Author(s)
Willem Talloen and Tobias Verbeke
Examples
### this example demonstrates the logic of sampling in case of unbalanced distribution of classes
y <- factor(c(rep("A", 21), rep("B", 80)))
nlcv:::inTrainingSample(y, 2/3, "unbalanced")
table(y[nlcv:::inTrainingSample(y, 2/3, "unbalanced")]) # should be 14, 14 (for A, B resp.)
table(y[!nlcv:::inTrainingSample(y, 2/3, "unbalanced")]) # should be 7, 66 (for A, B resp.)
[Package nlcv version 0.3.5 Index]