centralities {mmb} | R Documentation |
Given a neighborhood of data, computes the similarity of each sample in the neighborhood to the neighborhood.
Description
Takes a data.frame of samples, then builds a PDF/PMF or ECDF for each of the selected features. Then, for each sample, computes the product of probabilities. The result is a vector that holds a probability for each sample. That probability (or relative likelihood) then represents the vicinity (or similarity) of the sample to the given neighborhood.
Usage
centralities(
dfNeighborhood,
selectedFeatureNames = c(),
shiftAmount = 0.1,
doEcdf = FALSE,
ecdfMinusOne = FALSE
)
Arguments
dfNeighborhood |
data.frame that holds all rows that make up the neighborhood. |
selectedFeatureNames |
vector of names of features to use. The centrality of each row in the neighborhood is calculated based on the selected features. |
shiftAmount |
numeric DEFAULT 0.1 optional amount to shift each features probability by. This is useful for when the centrality not necessarily must be an actual probability and too many features are selected. To obtain actual probabilities, this needs to be 0, and you must use the ECDF. |
doEcdf |
boolean DEFAULT FALSE whether to use the ECDF instead of the EPDF to find the likelihood of continuous values. |
ecdfMinusOne |
boolean DEFAULT FALSE only has an effect if the ECDF is used. If true, uses 1 minus the ECDF to find the probability of a continuous value. Depending on the interpretation of what you try to do, this may be of use. |
Value
a named vector, where the names correspond to the rownames of the rows in the given neighborhood, and the value is the centrality of that row.
Author(s)
Sebastian Hönel sebastian.honel@lnu.se
Examples
# Create a neighborhood:
nbh <- mmb::neighborhood(df = iris, features = mmb::createFeatureForBayes(
name = "Sepal.Width", value = mean(iris$Sepal.Width)))
cent <- mmb::centralities(dfNeighborhood = nbh, shiftAmount = 0.1,
doEcdf = TRUE, ecdfMinusOne = TRUE)
# Plot the ordered samples to get an idea of the centralities in the neighborhood:
plot(x = names(cent), y=cent)