neighborhood {mmb} | R Documentation |
Given Bayesian features, returns those samples from a dataset that exhibit a similarity (i.e., the neighborhood).
Description
The neighborhood N_i
is defined as the set of samples that
have a similarity greater than zero to the given sample s_i
. Segmentation
is done using equality (==
) for discrete features and less than or equal
(<=
) for continuous features. Note that feature values NA
and NaN
are also supported using is.na()
and is.nan()
.
Usage
neighborhood(df, features, selectedFeatureNames = c(), retainMinValues = 0)
Arguments
df |
data.frame to select the neighborhood from |
features |
data.frame of Bayes-features, used to segment/select the rows that should make up the neighborhood. |
selectedFeatureNames |
vector of names of features to use to demarcate the neighborhood. If empty, uses all features' names. |
retainMinValues |
DEFAULT 0 the amount of samples to retain during segmentation. For separating a neighborhood, this value typically should be 0, so that no samples are included that are not within it. However, for very sparse data or a great amount of variables, it might still make sense to retain samples. |
Value
data.frame with rows that were selected as neighborhood. It is guaranteed that the rownames are maintained.
Author(s)
Sebastian Hönel sebastian.honel@lnu.se
Examples
nbh <- mmb::neighborhood(df = iris, features = mmb::createFeatureForBayes(
name = "Sepal.Width", value = mean(iris$Sepal.Width)))
print(nrow(nbh))