KMD {KMD} | R Documentation |
Kernel Measure of Multi-sample Dissimilarity
Description
Compute the kernel measure of multi-sample dissimilarity (KMD) with directed K-nearest neighbor (K-NN) graph or minimum spanning tree (MST).
Usage
KMD(X, Y, M = length(unique(Y)), Knn = 1, Kernel = "discrete")
Arguments
X |
the data matrix (n by dx) or the distance/similarity matrix (n by n) |
Y |
a vector of length n, indicating the labels (from 1 to M) of the data |
M |
the number of possible labels |
Knn |
the number of nearest neighbors to use, or "MST" |
Kernel |
an M by M kernel matrix with row i and column j being the kernel value |
Details
The kernel measure of multi-sample dissimilarity (KMD) measures the dissimilarity between multiple samples, based on the observations from them. It converges to the population quantity (depending on the kernel) which is between 0 and 1. A small value indicates the multiple samples are from the same distribution, and a large value indicates the corresponding distributions are different. The population quantity is 0 if and only if all distributions are the same, and 1 if and only if all distributions are mutually singular.
If X is an n by n matrix, it will be interpreted as a distance/similarity matrix. In such case, MST requires it to be symmetric (an undirected graph). K-NN graph does not require it to be symmetric, with the nearest neighbors of point i computed based on the i-th row, and ties broken at random. The diagonal terms (self-distances) will be ignored. If X is an n by dx data matrix, Euclidean distance will be used for computing the K-NN graph (ties broken at random) and the MST.
Value
The algorithm returns a real number which is the sample KMD and is asymptotically between 0 and 1.
See Also
Examples
n = 60
d = 2
set.seed(1)
X1 = matrix(runif(n*d/2),ncol = d)
X2 = matrix(runif(n*d/2),ncol = d)
X2[,1] = X2[,1] + 1
X = rbind(X1,X2)
Y = c(rep(1,n/2),rep(2,n/2))
print(KMD(X, Y, M = 2, Knn = 1, Kernel = "discrete"))
# 0.9344444. X1 and X2 are mutually singular, so the theoretical KMD is 1.
print(KMD(X, Y, M = 2, Knn = 1, Kernel = base::diag(c(1,1))))
# 0.9344444. This is essentially the same as specifying the discrete kernel above.
print(KMD(X, Y, M = 2, Knn = 2, Kernel = "discrete"))
print(KMD(X, Y, M = 2, Knn = "MST", Kernel = "discrete"))
# 0.9508333, 0.9399074. One can also use other geometric graphs (2-NN graph and MST here)
# to estimate the same theoretical quantity.