vicinitiesForSample {mmb} | R Documentation |
Segment a dataset by a single sample and compute vicinities for it and the remaining samples in the neighborhood.
Description
Given some data and one sample s_i
from it, constructs the
neighborhood N_i
of that sample and assigns centralities to all other
samples in that neighborhood to it. Samples that lie outside the neighborhood
are assigned a vicinity of zero. Uses mmb::neighborhood()
and
mmb::centralities()
.
Usage
vicinitiesForSample(
df,
sampleFromDf,
selectedFeatureNames = c(),
shiftAmount = 0.1,
doEcdf = FALSE,
ecdfMinusOne = FALSE,
retainMinValues = 0
)
Arguments
df |
data.frame that holds the data (and also the sample to use to define the neighborhood). Each sample in this data.frame is assigned a vicinity. |
sampleFromDf |
data.frame a single row from the given data.frame. This is used to select a neighborhood from the given data. |
selectedFeatureNames |
vector of names of features to use to compute the
vicinity/centrality. This is passed to |
shiftAmount |
numeric DEFAULT 0.1 optional amount to shift each features probability by. This is useful for when the centrality not necessarily must be an actual probability and too many features are selected. To obtain actual probabilities, this needs to be 0, and you must use the ECDF. |
doEcdf |
boolean DEFAULT FALSE whether to use the ECDF instead of the EPDF to find the likelihood of continuous values. |
ecdfMinusOne |
boolean DEFAULT FALSE only has an effect if the ECDF is used. If true, uses 1 minus the ECDF to find the probability of a continuous value. Depending on the interpretation of what you try to do, this may be of use. |
retainMinValues |
DEFAULT 0 the amount of samples to retain during segmentation. For separating a neighborhood, this value typically should be 0, so that no samples are included that are not within it. However, for very sparse data or a great amount of variables, it might still make sense to retain samples. |
Value
data.frame with a single column 'vicinity' and the same rownames as the given data.frame. Each row then holds the vicinity for the corresponding row.
Author(s)
Sebastian Hönel sebastian.honel@lnu.se
Examples
vic <- mmb::vicinitiesForSample(
df = iris, sampleFromDf = iris[1,], shiftAmount = 0.1)
vic$vicinity
# Plot the ordered samples to get an idea which ones have a vicinity > 0
plot(x=rownames(vic), y=vic$vicinity)