binContinuous {predictMe}R Documentation

Categorization of measured and predicted outcome values.

Description

Measured and predicted continuous outcome values (transformed to range between 0 and 100) are categorized as bins, depending on the selected 'binWidth'.

Usage

binContinuous(
  x = NULL,
  measColumn = NULL,
  binWidth = 20,
  computeRange = TRUE,
  range_x = c(0, 0)
)

Arguments

x

A data.frame with exactly two columns, one of the columns must be the measured outcome, the other column must be the predicted outcome values, as returned by some algorithm.

measColumn

A single integer number that denotes which of the two columns of function argument 'x' contains the measured outcome.

binWidth

A single integer value greater than 0 and less than 100, which separates 100 into equal bins, e.g., 20 (100/20 = 5 equal bins).

computeRange

Logical value, defaults to TRUE, meaning that the range of the column with the measured outcome values will be computed. Else set this argument to FALSE (see Details).

range_x

A vector with the minimum and maximum possible value of the continuous outcome scale (see Details).

Details

Regarding function arguments 'computeRange' and 'range_x': If either the minimum or maximum possible value of the outcome scale has not occurred, e.g., none of the participants selected the maximum possible answer option, then the user must pass the possible range of outcome values to this function, using the function argument 'range_x', e.g., range_x = c(1, 5), if the original outcome scale ranged from 1 to 5.

Regarding function output 'xTrans' (see Value): Predicted values less than 0 or greater than 100 are replaced by 0 and 100, respectively.

Beware: The differences in column 5 are as accurate (no information loss) as if the original measured and predicted outcome values were subtracted from one another. However, since binning continuous values always introduces noise, some of the differences in column 7 (bin differences) require explicit attention (see package vignette, headline Bin noise).

Value

a list with two data.frames and one vector. Each data.frame has 7 columns:

  1. xTrans Data set, with columns 1 and 2 being categorized, according to the user's selected bin width. Column 3 displays the observed outcome values, whereas column 4 displays the predicted outocme values (fitted values), both transformed to range between 0 and 100. Column 5 shows the difference between values in column 3 and column 4. Column 6 shows the unique individual identifiers. Column 7 shows the differences in terms of bins. See Details.

  2. xTrans2 Same as xTrans, only that original or transformed values less than 0 or greater than 100 have not been replaced with 0 or 100, respectively.

  3. idxExceed logical vector. TRUE shows the row of xTrans or xTrans2 where values were either less than 0 or greater than 100.

Author(s)

Marcel Miché

Examples

# Simulate data set with continuous outcome (use all default values)
dfContinuous <- quickSim()
# Use multiple linear regression as algorithm to predict the outcome.
lmRes <- lm(y~x1+x2,data=dfContinuous)
# Extract measured outcome and the predicted outcome (fitted values)
# from the regression output, put both in a data.frame.
lmDf <- data.frame(measOutcome=dfContinuous$y,
                   fitted=lmRes$fitted.values)
# Apply function binContinuous, generate 5 equal bins (transformed
# outcome 0-100, bin width = 20, yields 5 bins).
x100c <- binContinuous(x=lmDf, measColumn = 1, binWidth = 20)

[Package predictMe version 0.1 Index]