bayesProbability {mmb}R Documentation

Full Bayesian inferencing for determining the probability or relative likelihood of a given value.

Description

Uses the full extended theorem of Bayes, taking all selected features into account. Expands Bayes' theorem to accomodate all dependent features, then calculates each conditional probability (or relative likelihood) and returns a single result reflecting the probability or relative likelihood of the target feature assuming its given value, given that all the other dependent features assume their given value. The target feature (designated by 'labelCol') may be discrete or continuous. If at least one of the depending features or the the target feature is continuous and the PDF ('doEcdf' = FALSE) is built, the result of this function is a relative likelihood of the target feature's value. If all of the features are discrete or the empirical CDF is used instead of the PDF, the result of this function is a probability.

Usage

bayesProbability(
  df,
  features,
  targetCol,
  selectedFeatureNames = c(),
  shiftAmount = 0.1,
  retainMinValues = 1,
  doEcdf = FALSE,
  useParallel = NULL
)

Arguments

df

data.frame that contains all the feature's data

features

data.frame with bayes-features. One of the features needs to be the label-column.

targetCol

string with the name of the feature that represents the label.

selectedFeatureNames

vector default c(). Vector of strings that are the names of the features the to-predict label depends on. If an empty vector is given, then all of the features are used (except for the label). The order then depends on the features' order.

shiftAmount

numeric an offset value used to increase any one probability (factor) in the full built equation. In scenarios with many dependencies, it is more likely that a single conditional probability becomes zero, which would result in the entire probability being zero. Since this is often useless, the 'shiftAmount' can be added to each factor, resulting in a non-zero probability that can at least be used to order samples by likelihood. Note that, with a positive 'shiftAmount', the result of this function cannot be said to be a probability any longer, but rather results in a comparable likelihood (a 'probability score').

retainMinValues

integer to require a minimum amount of data points when segmenting the data feature by feature.

doEcdf

default FALSE a boolean to indicate whether to use the empirical CDF to return a probability when inferencing a continuous feature. If false, uses the empirical PDF to return the rel. likelihood. This parameter does not have any effect if all of the variables are discrete or when doing a regression. Otherwise, for each continuous variable, the probability to find a value less then or equal - given the conditions - is returned. Note that the interpretation of probability using the ECDF much deviates and must be used with care, especially since it affects each factor in Bayes equation that is continuous. This is especially true for the case where shiftAmount > 0.

useParallel

default NULL a boolean to indicate whether to use a previously registered parallel backend. If no explicit value was given, calls foreach::getDoParRegistered() to check for a parallel backend. When using parallelism, this function calculates each factor in the numerator and denominator of the final equation in parallel.

Value

numeric probability (inferring discrete labels) or relative likelihood (regression, inferring likelihood of continuous value) or most likely value given the conditional features. If using a positive shiftAmount, the result is a 'probability score'.

Author(s)

Sebastian Hönel sebastian.honel@lnu.se

References

Bayes T (1763). “LII. An essay towards solving a problem in the doctrine of chances. By the late Rev. Mr. Bayes, FRS communicated by Mr. Price, in a letter to John Canton, AMFR S.” Philosophical transactions of the Royal Society of London, 370–418.

See Also

test-case "a zero denominator can happen"

Examples

feat1 <- mmb::createFeatureForBayes(
  name = "Petal.Length", value = mean(iris$Petal.Length))
feat2 <- mmb::createFeatureForBayes(
  name = "Petal.Width", value = mean(iris$Petal.Width))
featT <- mmb::createFeatureForBayes(
  name = "Species", iris[1,]$Species, isLabel = TRUE)

# Check the probability of Species=setosa, given the other 2 features:
mmb::bayesProbability(
  df = iris, features = rbind(feat1, feat2, featT), targetCol = "Species")

# Now check the probability of Species=versicolor:
featT$valueChar <- "versicolor"
mmb::bayesProbability(
  df = iris, features = rbind(feat1, feat2, featT), targetCol = "Species")

[Package mmb version 0.13.3 Index]