mediandiff {eummd} | R Documentation |
Compute the median difference between pairs of values
Description
Compute the median of all differences between distinct pairs in vectors or matrices.
Usage
mediandiff(X, Y = NULL, kernel = c("Laplacian", "Gaussian"), fast = FALSE)
Arguments
X |
Numeric vector or matrix of length |
Y |
Numeric vector or matrix of length |
kernel |
String, either |
fast |
Boolean; if |
Details
The median difference is defined as follows:
Z
is the combined X
and Y
values into a single vector
or matrix.
Number of columns is the dimension, and these need to be equal
for X
and Y
. Then, if the Laplacian kernel is used,
m = \textnormal{median} \{ || x_i - x_j ||_1; \,\, i>j, \,\,
i=1, 2,\dots, n+m,\,\,\textnormal{ and } j=1, 2,\dots, i-1 \},
where || z_i - z_j ||_1
is the 1-norm, and so if the data
are d
-dimensional then
|| z_i - z_j ||_1 = \sum_{k=1}^{d} |z_{i,k} - z_{j,k}|.
If the Gaussian kernel is specified, then the square of the two-norm is used.
The median heuristic is defined as beta = 1/m
.
Naive method will compute all distinct pairs, of which there are
N(N+1)/2
differences. These are then sorted using
a O(N \log N)
algorithm, so overall O(N^2 \log N)
.
The fast method is O(N \log N)
is from Croux and Rousseeuw (1992),
which is based on Johnson and Mizoguchi (1978).
Value
A scalar, the median of all pairwise differences.
References
Croux, C. and Rousseeuw, P. J. (1992), "Time-Efficient Algorithms for Two Highly Robust Estimators of Scale" In Computational Statistics: Volume 1: Proceedings of the 10th Symposium on Computational Statistics (pp. 411-428).
Johnson, D.B., and Mizoguchi, T. (1978), "Selecting the Kth Element in X + Y and X_1 + X_2 + ... + X_m", SIAM Journal of Computing, 7, 147-153.
See Also
medianheuristic
Examples
X <- c(7.1, 1.2, 4.3, 0.4)
Y <- c(5.5, 2.6, 8.7)
#using fast method, Laplacian kernel, loglinear in number of observations
md <- mediandiff(X, Y, fast=TRUE)
#using fast method, Gaussian kernel, loglinear in number of observations
md <- mediandiff(X, Y, fast=TRUE, kernel="Gaussian")
#using naive method (default), with Laplacian kernel
md <- mediandiff(X, Y)