distance {mmb}R Documentation

Given a neighborhood of data and two samples from that neighborhood, calculates the distance between the samples.

Description

The distance of two samples x,y from each other within a given neighborhood is defined as the absolute value of the subtraction of each sample's centrality to the neighborhood.

Usage

distance(
  dfNeighborhood,
  rowNrOfSample1,
  rowNrOfSample2,
  selectedFeatureNames = c(),
  shiftAmount = 0.1,
  doEcdf = FALSE,
  ecdfMinusOne = FALSE
)

Arguments

dfNeighborhood

data.frame that holds all rows that make up the neighborhood.

rowNrOfSample1

character the name of the row that constitutes the first sample from the given neighborhood.

rowNrOfSample2

character the name of the row that constitutes the second sample from the given neighborhood.

selectedFeatureNames

vector of names of features to use. The centrality of each row in the neighborhood is calculated based on the selected features.

shiftAmount

numeric DEFAULT 0.1 optional amount to shift each features probability by. This is useful for when the centrality not necessarily must be an actual probability and too many features are selected. To obtain actual probabilities, this needs to be 0, and you must use the ECDF.

doEcdf

boolean DEFAULT FALSE whether to use the ECDF instead of the EPDF to find the likelihood of continuous values.

ecdfMinusOne

boolean DEFAULT FALSE only has an effect if the ECDF is used. If true, uses 1 minus the ECDF to find the probability of a continuous value. Depending on the interpretation of what you try to do, this may be of use.

Value

numeric the distance as a positive number.

Author(s)

Sebastian Hönel sebastian.honel@lnu.se

Examples

# Show the distance between two samples using all their features:
mmb::distance(dfNeighborhood = iris, rowNrOfSample1 = 10, rowNrOfSample2 = 99)

# Let's use an actual neighborhood:
nbh <- mmb::neighborhood(df = iris, features = mmb::createFeatureForBayes(
  name = "Sepal.Length", value = mean(iris$Sepal.Length)))
mmb::distance(dfNeighborhood = nbh, rowNrOfSample1 = 1, rowNrOfSample2 = 30,
  selectedFeatureNames = colnames(iris)[1:3])

# Let's compare this to the distances as they are in iris (should be smaller):
mmb::distance(dfNeighborhood = iris, rowNrOfSample1 = 1, rowNrOfSample2 = 30,
  selectedFeatureNames = colnames(iris)[1:3])

[Package mmb version 0.13.3 Index]