Over-/Undersample {ModTools} | R Documentation |
Oversample and Undersample
Description
For classification purposes we might want to have balanced datasets. If the response variable has not a prevalence of 50%, we can sample records for getting as much response A cases as response B. This is called oversample. Undersample means to sample the (lower) number of cases A from the records of case B.
Usage
OverSample(x, vname)
UnderSample(x, vname)
Arguments
x |
a data frame containing predictors and response |
vname |
the name of the response variable to be used to over/undersample |
Value
a data frame with balanced response variable
Author(s)
Andri Signorell <andri@signorell.net>
See Also
Examples
OverSample(d.pima2, "diabetes")
UnderSample(d.pima2, "diabetes")
[Package ModTools version 0.9.6 Index]