Over-/Undersample {ModTools}R Documentation

Oversample and Undersample

Description

For classification purposes we might want to have balanced datasets. If the response variable has not a prevalence of 50%, we can sample records for getting as much response A cases as response B. This is called oversample. Undersample means to sample the (lower) number of cases A from the records of case B.

Usage

OverSample(x, vname)
UnderSample(x, vname)

Arguments

x

a data frame containing predictors and response

vname

the name of the response variable to be used to over/undersample

Value

a data frame with balanced response variable

Author(s)

Andri Signorell <andri@signorell.net>

See Also

BestCut

Examples

OverSample(d.pima2, "diabetes")

UnderSample(d.pima2, "diabetes")

[Package ModTools version 0.9.6 Index]