KMD {KMD}R Documentation

Kernel Measure of Multi-sample Dissimilarity

Description

Compute the kernel measure of multi-sample dissimilarity (KMD) with directed K-nearest neighbor (K-NN) graph or minimum spanning tree (MST).

Usage

KMD(X, Y, M = length(unique(Y)), Knn = 1, Kernel = "discrete")

Arguments

X

the data matrix (n by dx) or the distance/similarity matrix (n by n)

Y

a vector of length n, indicating the labels (from 1 to M) of the data

M

the number of possible labels

Knn

the number of nearest neighbors to use, or "MST"

Kernel

an M by M kernel matrix with row i and column j being the kernel value k(i, j); or "discrete" which indicates using the discrete kernel.

Details

The kernel measure of multi-sample dissimilarity (KMD) measures the dissimilarity between multiple samples, based on the observations from them. It converges to the population quantity (depending on the kernel) which is between 0 and 1. A small value indicates the multiple samples are from the same distribution, and a large value indicates the corresponding distributions are different. The population quantity is 0 if and only if all distributions are the same, and 1 if and only if all distributions are mutually singular.

If X is an n by n matrix, it will be interpreted as a distance/similarity matrix. In such case, MST requires it to be symmetric (an undirected graph). K-NN graph does not require it to be symmetric, with the nearest neighbors of point i computed based on the i-th row, and ties broken at random. The diagonal terms (self-distances) will be ignored. If X is an n by dx data matrix, Euclidean distance will be used for computing the K-NN graph (ties broken at random) and the MST.

Value

The algorithm returns a real number which is the sample KMD and is asymptotically between 0 and 1.

See Also

KMD_test

Examples

n = 60
d = 2
set.seed(1)
X1 = matrix(runif(n*d/2),ncol = d)
X2 = matrix(runif(n*d/2),ncol = d)
X2[,1] = X2[,1] + 1
X = rbind(X1,X2)
Y = c(rep(1,n/2),rep(2,n/2))
print(KMD(X, Y, M = 2, Knn = 1, Kernel = "discrete"))
# 0.9344444. X1 and X2 are mutually singular, so the theoretical KMD is 1.
print(KMD(X, Y, M = 2, Knn = 1, Kernel = base::diag(c(1,1))))
# 0.9344444. This is essentially the same as specifying the discrete kernel above.
print(KMD(X, Y, M = 2, Knn = 2, Kernel = "discrete"))
print(KMD(X, Y, M = 2, Knn = "MST", Kernel = "discrete"))
# 0.9508333, 0.9399074. One can also use other geometric graphs (2-NN graph and MST here)
# to estimate the same theoretical quantity.

[Package KMD version 0.1.0 Index]