R: Segment a dataset by a single sample and compute vicinities...

vicinitiesForSample {mmb}

R Documentation

Segment a dataset by a single sample and compute vicinities for it and the remaining samples in the neighborhood.

Description

Given some data and one sample s_i from it, constructs the neighborhood N_i of that sample and assigns centralities to all other samples in that neighborhood to it. Samples that lie outside the neighborhood are assigned a vicinity of zero. Uses mmb::neighborhood() and mmb::centralities().

Usage

vicinitiesForSample(
  df,
  sampleFromDf,
  selectedFeatureNames = c(),
  shiftAmount = 0.1,
  doEcdf = FALSE,
  ecdfMinusOne = FALSE,
  retainMinValues = 0
)

Arguments

`df`	data.frame that holds the data (and also the sample to use to define the neighborhood). Each sample in this data.frame is assigned a vicinity.
`sampleFromDf`	data.frame a single row from the given data.frame. This is used to select a neighborhood from the given data.
`selectedFeatureNames`	vector of names of features to use to compute the vicinity/centrality. This is passed to `mmb::neighborhood()`.
`shiftAmount`	numeric DEFAULT 0.1 optional amount to shift each features probability by. This is useful for when the centrality not necessarily must be an actual probability and too many features are selected. To obtain actual probabilities, this needs to be 0, and you must use the ECDF.
`doEcdf`	boolean DEFAULT FALSE whether to use the ECDF instead of the EPDF to find the likelihood of continuous values.
`ecdfMinusOne`	boolean DEFAULT FALSE only has an effect if the ECDF is used. If true, uses 1 minus the ECDF to find the probability of a continuous value. Depending on the interpretation of what you try to do, this may be of use.
`retainMinValues`	DEFAULT 0 the amount of samples to retain during segmentation. For separating a neighborhood, this value typically should be 0, so that no samples are included that are not within it. However, for very sparse data or a great amount of variables, it might still make sense to retain samples.

Value

data.frame with a single column 'vicinity' and the same rownames as the given data.frame. Each row then holds the vicinity for the corresponding row.

Author(s)

Sebastian Hönel sebastian.honel@lnu.se

Examples

vic <- mmb::vicinitiesForSample(
  df = iris, sampleFromDf = iris[1,], shiftAmount = 0.1)
vic$vicinity

# Plot the ordered samples to get an idea which ones have a vicinity > 0
plot(x=rownames(vic), y=vic$vicinity)

[Package mmb version 0.13.3 Index]