weightedMedian {matrixStats} | R Documentation |
Weighted Median Value
Description
Computes a weighted median of a numeric vector.
Usage
weightedMedian(x, w = NULL, idxs = NULL, na.rm = FALSE,
interpolate = is.null(ties), ties = NULL, ...)
Arguments
x |
|
w |
a vector of weights the same length as |
idxs |
A |
na.rm |
a logical value indicating whether |
interpolate |
If |
ties |
If |
... |
Not used. |
Value
Returns a numeric
scalar.
For the n
elements x = c(x[1], x[2], ..., x[n])
with positive
weights w = c(w[1], w[2], ..., w[n])
such that sum(w) = S
, the
weighted median is defined as the element x[k]
for which the
total weight of all elements x[i] < x[k]
is less or equal to
S/2
and for which the total weight of all elements x[i] > x[k]
is less or equal to S/2
(c.f. [1]).
When using linear interpolation, the weighted mean of x[k-1]
and
x[k]
with weights S[k-1]
and S[k]
corresponding to the
cumulative weights of those two elements is used as an estimate.
If w
is missing then all elements of x
are given the same
positive weight. If all weights are zero, NA_real_
is
returned.
If one or more weights are Inf
, it is the same as these weights have
the same weight and the others have zero. This makes things easier for cases
where the weights are result of a division with zero.
If there are missing values in w
that are part of the calculation
(after subsetting and dropping missing values in x
), then the final
result is always NA
of the same type as x
.
The weighted median solves the following optimization problem:
where
are scalars and
are the corresponding "weights" for each
individual
value.
Author(s)
Henrik Bengtsson and Ola Hossjer, Centre for Mathematical Sciences, Lund University. Thanks to Roger Koenker, Econometrics, University of Illinois, for the initial ideas.
References
[1] T.H. Cormen, C.E. Leiserson, R.L. Rivest, Introduction to Algorithms, The MIT Press, Massachusetts Institute of Technology, 1989.
See Also
median
, mean
() and
weightedMean
().
Examples
x <- 1:10
n <- length(x)
m1 <- median(x) # 5.5
m2 <- weightedMedian(x) # 5.5
stopifnot(identical(m1, m2))
w <- rep(1, times = n)
m1 <- weightedMedian(x, w) # 5.5 (default)
m2 <- weightedMedian(x, ties = "weighted") # 5.5 (default)
m3 <- weightedMedian(x, ties = "min") # 5
m4 <- weightedMedian(x, ties = "max") # 6
stopifnot(identical(m1, m2))
# Pull the median towards zero
w[1] <- 5
m1 <- weightedMedian(x, w) # 3.5
y <- c(rep(0, times = w[1]), x[-1]) # Only possible for integer weights
m2 <- median(y) # 3.5
stopifnot(identical(m1, m2))
# Put even more weight on the zero
w[1] <- 8.5
weightedMedian(x, w) # 2
# All weight on the first value
w[1] <- Inf
weightedMedian(x, w) # 1
# All weight on the last value
w[1] <- 1
w[n] <- Inf
weightedMedian(x, w) # 10
# All weights set to zero
w <- rep(0, times = n)
weightedMedian(x, w) # NA
# Simple benchmarking
bench <- function(N = 1e5, K = 10) {
x <- rnorm(N)
gc()
t <- c()
t[1] <- system.time(for (k in 1:K) median(x))[3]
t[2] <- system.time(for (k in 1:K) weightedMedian(x))[3]
t <- t / t[1]
names(t) <- c("median", "weightedMedian")
t
}
print(bench(N = 5, K = 100))
print(bench(N = 50, K = 100))
print(bench(N = 200, K = 100))
print(bench(N = 1000, K = 100))
print(bench(N = 10e3, K = 20))
print(bench(N = 100e3, K = 20))