R: Segment a dataset by each row once, then compute vicinities...

vicinities {mmb}

R Documentation

Segment a dataset by each row once, then compute vicinities of samples in the neighborhood.

Description

Given an entire dataset, uses each instance in it to demarcate a neighborhood using the selected features. Then, for each neighborhood, the vicinity of all samples to it is computed. The result of this is an N x N matrix, where the entry m_{i,j} corresponds to the vicinity of sample s_j in neighborhood N_i.

Usage

vicinities(
  df,
  selectedFeatureNames = c(),
  shiftAmount = 0.1,
  doEcdf = FALSE,
  ecdfMinusOne = FALSE,
  retainMinValues = 0,
  useParallel = NULL
)

Arguments

`df`	data.frame to compute the matrix of vicinites for.
`selectedFeatureNames`	vector of names of features to use for computing the vicinity/centrality of each sample to each neighborhood.
`shiftAmount`	numeric DEFAULT 0.1 optional amount to shift each features probability by. This is useful for when the centrality not necessarily must be an actual probability and too many features are selected. To obtain actual probabilities, this needs to be 0, and you must use the ECDF.
`doEcdf`	boolean DEFAULT FALSE whether to use the ECDF instead of the EPDF to find the likelihood of continuous values.
`ecdfMinusOne`	boolean DEFAULT FALSE only has an effect if the ECDF is used. If true, uses 1 minus the ECDF to find the probability of a continuous value. Depending on the interpretation of what you try to do, this may be of use.
`retainMinValues`	DEFAULT 0 the amount of samples to retain during segmentation. For separating a neighborhood, this value typically should be 0, so that no samples are included that are not within it. However, for very sparse data or a great amount of variables, it might still make sense to retain samples.
`useParallel`	boolean DEFAULT NULL whether to use parallelism or not. Setting this to true requires also having previously registered a parallel backend. If parallel computing is enabled, then each neighborhood is computed separately.

Value

matrix of length N^2 (N being the length of the data.frame). Each row i demarcates the neighborhood as selected by sample i, and each column j then is the vicinity of sample s_j to that neighborhood. No value of the diagonal is zero, because each neighborhood always contains the sample it was demarcated by, and that sample has a similarity greater than zero to it.

Author(s)

Sebastian Hönel sebastian.honel@lnu.se

Examples

w <- mmb::getWarnings()
mmb::setWarnings(FALSE)
mmb::vicinities(df = iris[1:10,])

# Run the same, but use the ECDF and retain more values:
mmb::vicinities(df = iris[1:10,], doEcdf = TRUE, retainMinValues = 10)
mmb::setWarnings(w)

[Package mmb version 0.13.3 Index]