R: Segment data according to one or more random variables.

conditionalDataMin {mmb}

R Documentation

Segment data according to one or more random variables.

Description

Takes a data.frame and segments it, according to the selected variables. Only rows satisfying all conditions are kept. Supports discrete and continuous variables. Supports NA, NaN and NULL by using is.na, is.nan and is.null as comparator.

Usage

conditionalDataMin(
  df,
  features,
  selectedFeatureNames = c(),
  retainMinValues = 1
)

Arguments

`df`	data.frame with data to segment. If it contains less than or equally many rows as specified by `retainMinValues`, then the same data.frame is returned.
`features`	data.frame of bayes-features that are used to segment. Each feature's value is used to segment the data, and the features are used in the order as given by `selectedFeatureNames`. If those are not given, then the order of this data.frame is used.
`selectedFeatureNames`	default `c()`. Character vector with the names of the variables that shall be used for segmenting. Segmenting is done variable by variable, and the order depends on this vector. If this vector is empty, then the originally given data.frame is returned.
`retainMinValues`	default 1. The minimum amount of rows to retain. Filtering the data by the selected features may reduce the amount of remaining rows quickly, and this can be used as an early stopping criteria. Note that filtering is done variable by variable, and the amount of remaining rows is evaluated after each segmenting-step. If the threshold is undercut, then the result from the previous round is returned.

Value

data.frame that is segmented according to the selected variables and the minimum amount of rows to retain.

Author(s)

Sebastian Hönel sebastian.honel@lnu.se

Examples

feat1 <- mmb::createFeatureForBayes(
  name = "Petal.Length", value = mean(iris$Petal.Length))
feat2 <- mmb::createFeatureForBayes(
  name = "Petal.Width", value = mean(iris$Petal.Width))
feats <- rbind(feat1, feat2)

data <- mmb::conditionalDataMin(df = iris, features = feats,
  selectedFeatureNames = feats$name, retainMinValues = 1)

[Package mmb version 0.13.3 Index]