vicinitiesForSample {mmb}R Documentation

Segment a dataset by a single sample and compute vicinities for it and the remaining samples in the neighborhood.

Description

Given some data and one sample s_i from it, constructs the neighborhood N_i of that sample and assigns centralities to all other samples in that neighborhood to it. Samples that lie outside the neighborhood are assigned a vicinity of zero. Uses mmb::neighborhood() and mmb::centralities().

Usage

vicinitiesForSample(
  df,
  sampleFromDf,
  selectedFeatureNames = c(),
  shiftAmount = 0.1,
  doEcdf = FALSE,
  ecdfMinusOne = FALSE,
  retainMinValues = 0
)

Arguments

df

data.frame that holds the data (and also the sample to use to define the neighborhood). Each sample in this data.frame is assigned a vicinity.

sampleFromDf

data.frame a single row from the given data.frame. This is used to select a neighborhood from the given data.

selectedFeatureNames

vector of names of features to use to compute the vicinity/centrality. This is passed to mmb::neighborhood().

shiftAmount

numeric DEFAULT 0.1 optional amount to shift each features probability by. This is useful for when the centrality not necessarily must be an actual probability and too many features are selected. To obtain actual probabilities, this needs to be 0, and you must use the ECDF.

doEcdf

boolean DEFAULT FALSE whether to use the ECDF instead of the EPDF to find the likelihood of continuous values.

ecdfMinusOne

boolean DEFAULT FALSE only has an effect if the ECDF is used. If true, uses 1 minus the ECDF to find the probability of a continuous value. Depending on the interpretation of what you try to do, this may be of use.

retainMinValues

DEFAULT 0 the amount of samples to retain during segmentation. For separating a neighborhood, this value typically should be 0, so that no samples are included that are not within it. However, for very sparse data or a great amount of variables, it might still make sense to retain samples.

Value

data.frame with a single column 'vicinity' and the same rownames as the given data.frame. Each row then holds the vicinity for the corresponding row.

Author(s)

Sebastian Hönel sebastian.honel@lnu.se

Examples

vic <- mmb::vicinitiesForSample(
  df = iris, sampleFromDf = iris[1,], shiftAmount = 0.1)
vic$vicinity

# Plot the ordered samples to get an idea which ones have a vicinity > 0
plot(x=rownames(vic), y=vic$vicinity)

[Package mmb version 0.13.3 Index]