bayesInferSimple {mmb}R Documentation

Perform simple (network) Bayesian inferencing and regression.

Description

Uses simple Bayesian inference to determine the probability or relative likelihood of a given value. This function can also regress to the most likely value instead. Simple means that segmented data is used in a way that is equal to how a Bayesian network works. For a finite set of labels, this function needs to be called for each, to obtain the probability of each label (or, for n-1 labels or until a label with >.5 probability is found). For obtaining the probability of a continuous value, this function is useful for deciding between picking among a finite set of values. The empirical CDF may be used to obtain an actual probability for a given continuous value, otherwise, the empirical PDF is estimated and a relative likelihood is returned. For regression, set doRegress = TRUE to obtain the most likely value of the target feature, instead of obtaining its relative likelihood.

Usage

bayesInferSimple(
  df,
  features,
  targetCol,
  selectedFeatureNames = c(),
  retainMinValues = 1,
  doRegress = FALSE,
  doEcdf = FALSE,
  regressor = NULL
)

Arguments

df

data.frame

features

data.frame with bayes-features. One of the features needs to be the label-column.

targetCol

string with the name of the feature that represents the label.

selectedFeatureNames

vector default c(). Vector of strings that are the names of the features the to-predict label depends on. If an empty vector is given, then all of the features are used (except for the label). The order then depends on the features' order.

retainMinValues

integer to require a minimum amount of data points when segmenting the data feature by feature.

doRegress

default FALSE a boolean to indicate whether to do a regression instead of returning the relative likelihood of a continuous feature. If the target feature is discrete and regression is requested, will issue a warning.

doEcdf

default FALSE a boolean to indicate whether to use the empirical CDF to return a probability when inferencing a continuous feature. If false, uses the empirical PDF to return the rel. likelihood. This parameter does not have any effect when inferring discrete values or when doing a regression.

regressor

Function that is given the collected values for regression and thus finally used to select a most likely value. Defaults to the built-in estimator for the empirical PDF and returns its argmax. However, any other function can be used, too, such as min, max, median, average etc. You may also use this function to obtain the raw values for further processing. This function is ignored if not doing regression.

Value

numeric probability (inferring discrete labels) or relative likelihood (regression, inferring likelihood of continuous value) or most likely value given the conditional features.

Author(s)

Sebastian Hönel sebastian.honel@lnu.se

References

Scutari M (2010). “Learning Bayesian Networks with the bnlearn R Package.” Journal of Statistical Software, 35(3), 1–22. doi: 10.18637/jss.v035.i03.

Examples

feat1 <- mmb::createFeatureForBayes(
  name = "Petal.Length", value = mean(iris$Petal.Length))
feat2 <- mmb::createFeatureForBayes(
  name = "Petal.Width", value = mean(iris$Petal.Width))
featT <- mmb::createFeatureForBayes(
  name = "Species", iris[1,]$Species, isLabel = TRUE)

# Infer likelihood of featT's label:
feats <- rbind(feat1, feat2, featT)
mmb::bayesInferSimple(df = iris, features = feats, targetCol = featT$name)

# Infer likelihood of feat1's value:
featT$isLabel = FALSE
feat1$isLabel = TRUE
# We do not bind featT this time:
feats <- rbind(feat1, feat2)
mmb::bayesInferSimple(df = iris, features = feats, targetCol = feat1$name)

[Package mmb version 0.13.3 Index]