bayesRegressAssign {mmb}R Documentation

Regression for one or more samples, given some training data.

Description

This method uses full-dependency (simple=F) Bayesian inferencing to to a regression for the target features for all of the samples given in dfValid. Assigns a regression value using either

Usage

bayesRegressAssign(
  dfTrain,
  dfValid,
  targetCol,
  selectedFeatureNames = c(),
  shiftAmount = 0.1,
  retainMinValues = 2,
  doEcdf = FALSE,
  online = 0,
  simple = FALSE,
  useParallel = NULL,
  numBuckets = ceiling(log2(nrow(df))),
  sampleFromAllBuckets = TRUE,
  regressor = NULL
)

Arguments

dfTrain

data.frame that holds the training data.

dfValid

data.frame that holds the validation samples, for each of which a probability is sought. The convention is, that if you attempt to assign a probability to a numeric value, it ought to be found in the target column of this data frame (otherwise, the target column is not required in it).

targetCol

character the name of targeted feature, i.e., the feature to assign a probability to.

selectedFeatureNames

character defaults to empty vector which defaults to using all available features. Use this to select subsets of features and to order features.

shiftAmount

numeric an offset value used to increase any one probability (factor) in the full built equation.

retainMinValues

integer to require a minimum amount of data points when segmenting the data feature by feature.

doEcdf

default FALSE a boolean to indicate whether to use the empirical CDF to return a probability when inferencing a continuous feature.

online

default 0 integer to indicate how many rows should be used to do inferencing. If zero, then only the initially given data.frame dfTrain is used. If > 0, then each inferenced sample will be attached to it and the resulting data.frame is truncated to this number. Use an integer large enough (i.e., sum of training and validation rows) to keep all samples during inferencing. A smaller amount as, e.g., in dfTrain, will keep the amount of data restricted, discarding older rows. A larger amount than, e.g., in dfTrain is also fine; dfTrain will grow to it and then discard rows.

simple

default FALSE boolean to indicate whether or not to use simple Bayesian inferencing instead of full. This is faster but the results are less good. If true, uses mmb::bayesRegressSimple(). Otherwise, uses mmb::bayesRegress().

useParallel

boolean DEFAULT NULL this is forwarded to the underlying function mmb::bayesRegress() (only in simple=FALSE mode).

numBuckets

integer the amount of buckets to for discretization. Buckets are built in an equidistant manner, not as quantiles (i.e., one bucket has likely a different amount of values than another).

sampleFromAllBuckets

default TRUE boolean to indicate how to obtain values for regression from the buckets. If true, than takes values from those buckets with a non-zero probability, and according to their probability. If false, selects all values from the bucket with the highest probability.

regressor

Function that is given the collected values for regression and thus finally used to select a most likely value. Defaults to the built-in estimator for the empirical PDF and returns its argmax. However, any other function can be used, too, such as min, max, median, average etc. You may also use this function to obtain the raw values for further processing.#'

Author(s)

Sebastian Hönel sebastian.honel@lnu.se

See Also

mmb::bayesRegress() (full) or @seealso mmb::bayesRegressSimple() if simple=T. It mostly forwards the given arguments to these functions, and you will find good documentation there.

Examples


df <- iris[, ]
set.seed(84735)
rn <- base::sample(rownames(df), 150)
dfTrain <- df[1:120, ]
dfValid <- df[121:150, ]
res <- mmb::bayesRegressAssign(
  dfTrain, dfValid[, !(colnames(dfValid) %in% "Sepal.Length")],
  "Sepal.Length", sampleFromAllBuckets = TRUE, doEcdf = TRUE)
cov(res, iris[121:150,]$Sepal.Length)^2


[Package mmb version 0.13.3 Index]