ovun.sample {ROSE} | R Documentation |
Over-sampling, under-sampling, combination of over- and under-sampling.
Description
Creates possibly balanced samples by random over-sampling minority examples, under-sampling majority examples or combination of over- and under-sampling.
Usage
ovun.sample(formula, data, method="both", N, p=0.5,
subset=options("subset")$subset,
na.action=options("na.action")$na.action, seed)
Arguments
formula |
An object of class |
data |
An optional data frame, list or environment (or object
coercible to a data frame by |
method |
One among |
N |
The desired sample size of the resulting data set.
If missing and |
p |
The probability of resampling from the rare class.
If missing and |
subset |
An optional vector specifying a subset of observations to be used in the sampling process.
The default is set by the |
na.action |
A function which indicates what should happen when the data contain 'NA's.
The default is set by the |
seed |
A single value, interpreted as an integer, recommended to specify seeds and keep trace of the sample. |
Value
The value is an object of class ovun.sample
which has components
Call |
The matched call. |
method |
The method used to balance the sample. Possible choices are |
data |
The resulting new data set. |
See Also
ROSE
.
Examples
# 2-dimensional example
# loading data
data(hacide)
# imbalance on training set
table(hacide.train$cls)
# balanced data set with both over and under sampling
data.balanced.ou <- ovun.sample(cls~., data=hacide.train,
N=nrow(hacide.train), p=0.5,
seed=1, method="both")$data
table(data.balanced.ou$cls)
# balanced data set with over-sampling
data.balanced.over <- ovun.sample(cls~., data=hacide.train,
p=0.5, seed=1,
method="over")$data
table(data.balanced.over$cls)