mediandiff {eummd}R Documentation

Compute the median difference between pairs of values

Description

Compute the median of all differences between distinct pairs in vectors or matrices.

Usage

mediandiff(X, Y = NULL, kernel = c("Laplacian", "Gaussian"), fast = FALSE)

Arguments

X

Numeric vector or matrix of length n.

Y

Numeric vector or matrix of length m, or NULL.

kernel

String, either "Laplacian" or "Gaussian".

fast

Boolean; if TRUE will run O(N \log N) algorithm, where N = n + m, but if FALSE (default) will run naive O(N^2 \log N) algorithm.

Details

The median difference is defined as follows:

Z is the combined X and Y values into a single vector or matrix. Number of columns is the dimension, and these need to be equal for X and Y. Then, if the Laplacian kernel is used,

m = \textnormal{median} \{ || x_i - x_j ||_1; \,\, i>j, \,\, i=1, 2,\dots, n+m,\,\,\textnormal{ and } j=1, 2,\dots, i-1 \},

where || z_i - z_j ||_1 is the 1-norm, and so if the data are d-dimensional then

|| z_i - z_j ||_1 = \sum_{k=1}^{d} |z_{i,k} - z_{j,k}|.

If the Gaussian kernel is specified, then the square of the two-norm is used.

The median heuristic is defined as beta = 1/m.

Naive method will compute all distinct pairs, of which there are N(N+1)/2 differences. These are then sorted using a O(N \log N) algorithm, so overall O(N^2 \log N).

The fast method is O(N \log N) is from Croux and Rousseeuw (1992), which is based on Johnson and Mizoguchi (1978).

Value

A scalar, the median of all pairwise differences.

References

Croux, C. and Rousseeuw, P. J. (1992), "Time-Efficient Algorithms for Two Highly Robust Estimators of Scale" In Computational Statistics: Volume 1: Proceedings of the 10th Symposium on Computational Statistics (pp. 411-428).

Johnson, D.B., and Mizoguchi, T. (1978), "Selecting the Kth Element in X + Y and X_1 + X_2 + ... + X_m", SIAM Journal of Computing, 7, 147-153.

See Also

medianheuristic

Examples


X <- c(7.1, 1.2, 4.3, 0.4)
Y <- c(5.5, 2.6, 8.7)
#using fast method, Laplacian kernel, loglinear in number of observations
md <- mediandiff(X, Y, fast=TRUE)

#using fast method, Gaussian kernel, loglinear in number of observations
md <- mediandiff(X, Y, fast=TRUE, kernel="Gaussian")

#using naive method (default), with Laplacian kernel
md <- mediandiff(X, Y)


[Package eummd version 0.1.9 Index]