binarize {bsnsing}R Documentation

Create Binary Variables by the Classification Target

Description

Create a set of variables (columns) with binary values for each column in the input data. For a variable with values of 0 and 1, the column is retained and no new column is created. For a numeric variable, the function binarize.numeric is called. For a factor column, the function binarize.factor is called.

Usage

binarize(
  x,
  y,
  target = stop("'target' (0 or 1) must be provided"),
  control = bscontrol()
)

Arguments

x

a data frame or matrix to be binarized.

y

a vector with two unique values (0 and 1). It is the response variable that guides the optimal discretization of variables in x.

target

the level of y (0 or 1) which indicates the boolean rule target

control

a list or a bscontrol() object. The list should contain the following three attributes: nseg.numeric, a positive integer indicating the maximum number of segments used in discretizing a numeric variable, nseg.factor, a positive integer indicating the maximum number of levels allowed for a factor variable, and bin.size, a positive integer indicating the minimum number of observations to fall in a segment.

Value

a data frame containing binary variables, or a character string describing the rule that perfectly split the target.

Examples

# Load and prepare data
x <- auto[, c('mpg', 'cylinders', 'displacement')]
x$cylinders <- as.factor(x$cylinders)
y <- ifelse(auto$origin == 'USA', 1L, 0L)
# binarize x by y = 1
bx1 <- binarize(x, y, target = 1)
head(bx1)
# binarize x by y = 0
bx0 <- binarize(x, y, target = 0)
head(bx0)
# when selecting only one column from a data frame, use drop = FALSE to maintain structure
binarize(auto[,'mpg', drop = FALSE], y, target = 1)


[Package bsnsing version 1.0.1 Index]