R: Weighted Median Value

weightedMedian {matrixStats}

R Documentation

Weighted Median Value

Description

Computes a weighted median of a numeric vector.

Usage

weightedMedian(x, w = NULL, idxs = NULL, na.rm = FALSE,
  interpolate = is.null(ties), ties = NULL, ...)

Arguments

`x`	`vector` of type `integer`, `numeric`, or `logical`.
`w`	a vector of weights the same length as `x` giving the weights to use for each element of `x`. Negative weights are treated as zero weights. Default value is equal weight to all values.
`idxs`	A `vector` indicating subset of elements to operate over. If `NULL`, no subsetting is done.
`na.rm`	a logical value indicating whether `NA` values in `x` should be stripped before the computation proceeds, or not. If `NA`, no check at all for `NA`s is done.
`interpolate`	If `TRUE`, linear interpolation is used to get a consistent estimate of the weighted median.
`ties`	If `interpolate == FALSE`, a character string specifying how to solve ties between two `x`'s that are satisfying the weighted median criteria. Note that at most two values can satisfy the criteria. When `ties` is `"min"` ("lower weighted median"), the smaller value of the two is returned and when it is `"max"` ("upper weighted median"), the larger value is returned. If `ties` is `"mean"`, the mean of the two values is returned. Finally, if `ties` is `"weighted"` (or `NULL`) a weighted average of the two are returned, where the weights are weights of all values `x[i] <= x[k]` and `x[i] >= x[k]`, respectively.
`...`	Not used.

Value

Returns a numeric scalar.

For the n elements x = c(x[1], x[2], ..., x[n]) with positive weights w = c(w[1], w[2], ..., w[n]) such that sum(w) = S, the weighted median is defined as the element x[k] for which the total weight of all elements x[i] < x[k] is less or equal to S/2 and for which the total weight of all elements x[i] > x[k] is less or equal to S/2 (c.f. [1]).

When using linear interpolation, the weighted mean of x[k-1] and x[k] with weights S[k-1] and S[k] corresponding to the cumulative weights of those two elements is used as an estimate.

If w is missing then all elements of x are given the same positive weight. If all weights are zero, NA_real_ is returned.

If one or more weights are Inf, it is the same as these weights have the same weight and the others have zero. This makes things easier for cases where the weights are result of a division with zero.

If there are missing values in w that are part of the calculation (after subsetting and dropping missing values in x), then the final result is always NA of the same type as x.

The weighted median solves the following optimization problem:

\alpha^* = \arg_\alpha \min \sum_{i = 1}^{n} w_i |x_i-\alpha|

where x = (x_1, x_2, \ldots, x_n) are scalars and w = (w_1, w_2, \ldots, w_n) are the corresponding "weights" for each individual x value.

Author(s)

Henrik Bengtsson and Ola Hossjer, Centre for Mathematical Sciences, Lund University. Thanks to Roger Koenker, Econometrics, University of Illinois, for the initial ideas.

References

[1] T.H. Cormen, C.E. Leiserson, R.L. Rivest, Introduction to Algorithms, The MIT Press, Massachusetts Institute of Technology, 1989.

Examples

x <- 1:10
n <- length(x)

m1 <- median(x)                             # 5.5
m2 <- weightedMedian(x)                     # 5.5
stopifnot(identical(m1, m2))

w <- rep(1, times = n)
m1 <- weightedMedian(x, w)                  # 5.5 (default)
m2 <- weightedMedian(x, ties = "weighted")  # 5.5 (default)
m3 <- weightedMedian(x, ties = "min")       # 5
m4 <- weightedMedian(x, ties = "max")       # 6
stopifnot(identical(m1, m2))

# Pull the median towards zero
w[1] <- 5
m1 <- weightedMedian(x, w)                  # 3.5
y <- c(rep(0, times = w[1]), x[-1])         # Only possible for integer weights
m2 <- median(y)                             # 3.5
stopifnot(identical(m1, m2))

# Put even more weight on the zero
w[1] <- 8.5
weightedMedian(x, w)                # 2

# All weight on the first value
w[1] <- Inf
weightedMedian(x, w)                # 1

# All weight on the last value
w[1] <- 1
w[n] <- Inf
weightedMedian(x, w)                # 10

# All weights set to zero
w <- rep(0, times = n)
weightedMedian(x, w)                # NA

# Simple benchmarking
bench <- function(N = 1e5, K = 10) {
  x <- rnorm(N)
  gc()
  t <- c()
  t[1] <- system.time(for (k in 1:K) median(x))[3]
  t[2] <- system.time(for (k in 1:K) weightedMedian(x))[3]
  t <- t / t[1]
  names(t) <- c("median", "weightedMedian")
  t
}

print(bench(N =     5, K = 100))
print(bench(N =    50, K = 100))
print(bench(N =   200, K = 100))
print(bench(N =  1000, K = 100))
print(bench(N =  10e3, K =  20))
print(bench(N = 100e3, K =  20))

[Package matrixStats version 1.3.0 Index]