R: Categorization of predicted probabilities and the...

binBinary {predictMe}

R Documentation

Categorization of predicted probabilities and the corresponding mean number of events for each category.

Description

Predicted probabilities are categorized as bins, depending on the selected 'binWidth', and corresponding mean outcome per bin is computed.

Usage

binBinary(x = NULL, measColumn = NULL, binWidth = 20)

Arguments

`x`	A data.frame with exactly two columns, one of the columns must be the measured outcome, the other column must be the predicted outcome values, as returned by some algorithm.
`measColumn`	A single integer number that denotes which of the two columns of function argument 'x' contains the measured outcome.
`binWidth`	A single integer value greater than 0 and less than 100, which separates 100 into equal bins, e.g., 20 (100/20 = 5 equal bins).

Details

Predicted values (probability in percent) less than 0 or greater than 100 are replaced by 0 and 100, respectively.

Beware: Since binning continuous values always introduces noise, some of the differences in column 7 (bin differences) require explicit attention. When the outcome is binary, the binning of the predicted probabilities (fitted values) will also automatically introduce noise in column 5, since the mean number of measured events depends on the width and on the exact borders of the bins (see package vignette, headline Bin noise).

Value

a list with two data.frames and one vector. Each data.frame has 7 columns:

xTrans Data set, with columns 1 and 2 being categorized, according to the user's selected bin width. Each in percent, column 3 displays the observed frequencies per bin, whereas column 4 display the predicted probabilities (fitted values) per bin. Column 5 shows the difference between values in column 3 and column 4. Column 6 shows the unique individual identifiers. Column 7 shows the differences in terms of bins. See Details.
xTrans2 Same as xTrans, only that original or transformed values less than 0 or greater than 100 have not been replaced with 0 or 100, respectively.
idxExceed logical vector. TRUE shows the row of xTrans or xTrans2 where values were either less than 0 or greater than 100.

Author(s)

Marcel Miché

Examples

# Simulate data set with binary outcome
dfBinary <- quickSim(type="binary")
# Logistic regression, used as algorithm to predict the response variable
# (estimated probability of outcome being present).
glmRes <- glm(y~x1+x2,data=dfBinary,family="binomial")
# Extract measured outcome and the predicted probability (fitted values)
# from the logistic regression output, put both in a data.frame.
glmDf <- data.frame(measOutcome=dfBinary$y,
                    fitted=glmRes$fitted.values)
# Apply function binBinary, generate 5 equal bins (probabilities in
# percent, bin width 20, yields 5 bins).
x100b <- binBinary(x=glmDf, measColumn = 1, binWidth = 20)

[Package predictMe version 0.1 Index]