similarityweight {condvis2} | R Documentation |
Calculate the similarity weight for a set of observations
Description
Calculate the similarity weight for a set of observations, based on their distance from some arbitrary points in data space. Observations which are very similar to the point under consideration are given weight 1, while observations which are dissimilar to the point are given weight zero.
Usage
similarityweight(
x,
data,
threshold = 1,
distance = "euclidean",
lambda = NULL,
scale = TRUE
)
Arguments
x |
A dataframe describing arbitrary points in the space of the data
(i.e., with same |
data |
A dataframe representing observed data. |
threshold |
Threshold distance outside which observations will be assigned similarity weight zero. This is numeric and should be > 0. Defaults to 1. |
distance |
The type of distance measure to be used, currently just three
types of Minkowski distance: |
lambda |
A constant to multiply by the number of categorical
mismatches, before adding to the Minkowski distance, to give a general
dissimilarity measure. If left |
scale |
defaults to TRUE, in which case numeric variables are scaled to unit sd. |
Details
Similarity weight is assigned to observations based on their distance from a given point. The distance is calculated as Minkowski distance between the numeric elements for the observations whose categorical elements match, or else the Gower distance.
Value
A numeric vector or matrix, with values from 0 to 1. The similarity
weights for the observations in data
arranged in rows for each row
in x
.
References
O'Connell M, Hurley CB and Domijan K (2017). “Conditional Visualization for Statistical Models: An Introduction to the condvis Package in R.”Journal of Statistical Software, 81(5), pp. 1-20. <URL:http://dx.doi.org/10.18637/jss.v081.i05>.
Examples
## Say we want to find observations similar to the first observation.
## The first observation is identical to itself, so it gets weight 1. The
## second observation is similar, so it gets some weight. The rest are more
## different, and so get zero weight.
data(mtcars)
similarityweight(x = mtcars[1, ], data = mtcars)
## By increasing the threshold, we can find observations which are more
## approximately similar to the first row. Note that the second observation
## now has weight 1, so we lose some ability to discern how similar
## observations are by increasing the threshold.
similarityweight(x = mtcars[1, ], data = mtcars, threshold = 5)
## Can provide a number of points to 'x'. Here we see that the Mazda RX4 Wag
## is more similar to the Merc 280 than the Mazda RX4 is.
similarityweight(mtcars[1:2, ], mtcars, threshold = 3)