RepeatedMedian {robslopes} | R Documentation |
Siegel's repeated median slope and intercept estimator.
Description
Computes the repeated median slope proposed by Siegel (1982) using the algorithm by
Matousek et. al (1998). The algorithm runs in an expected time,
which is typically significantly faster than the
computational cost of
the naive algorithm, and requires
storage.
Usage
RepeatedMedian(x, y, alpha = NULL, beta = NULL, verbose = TRUE)
Arguments
x |
A vector of predictor values. |
y |
A vector of response values. |
alpha |
Determines the outer order statistic, which is equal to |
beta |
Determines the inner order statistic, which is equal to |
verbose |
Whether or not to print out the progress of the algorithm. Defaults to |
Details
Given two input vectors x
and y
of length , the repeated median is computed as
. The default "outer” median is the
largest element in the ordered median slopes. The inner median, which for each observation is calculated as the median of the slopes connected to this observation, is the
largest element in the ordered slopes. By changing
alpha
and beta
, other repeated order statistics of the slopes can be computed.
Value
A list with elements:
intecept |
The estimate of the intercept. |
slope |
The Theil-Sen estimate of the slope. |
Author(s)
Jakob Raymaekers
References
Siegel, A. F. (1982). Robust regression using repeated medians. Biometrika, 69(1), 242-244.
Matousek, J., Mount, D. M., & Netanyahu, N. S. (1998). Efficient randomized algorithms for the repeated median line estimator. Algorithmica, 20(2), 136-150.
Raymaekers (2023). "The R Journal: robslopes: Efficient Computation of the (Repeated) Median Slope", The R Journal. (link to open access pdf)
See Also
Examples
# We compare the implemented algorithm against a naive brute-force approach.
bruteForceRM <- function(x, y) {
n <- length(x)
medind1 <- floor((n+2) / 2)
medind2 <- floor((n+1) / 2)
temp <- t(sapply(1:n, function(z) sort(apply(cbind(x, y), 1 ,
function(k) (k[2] - y[z]) /
(k[1] - x[z])))))
RMslope <- sort(temp[, medind2])[medind1]
RMintercept <- sort(y - x * RMslope)[medind1]
return(list(intercept = RMintercept, slope = RMslope))
}
n = 100
set.seed(2)
x = rnorm(n)
y = x + rnorm(n)
t0 <- proc.time()
RM.fast <- RepeatedMedian(x, y, NULL, NULL, FALSE)
t1 <- proc.time()
t1 - t0
t0 <- proc.time()
RM.naive <- bruteForceRM(x, y)
t1 <- proc.time()
t1 - t0
RM.fast$slope - RM.naive$slope