runmed {stats} | R Documentation |
Running Medians – Robust Scatter Plot Smoothing
Description
Compute running medians of odd span. This is the ‘most robust’ scatter plot smoothing possible. For efficiency (and historical reason), you can use one of two different algorithms giving identical results.
Usage
runmed(x, k, endrule = c("median", "keep", "constant"),
algorithm = NULL,
na.action = c("+Big_alternate", "-Big_alternate", "na.omit", "fail"),
print.level = 0)
Arguments
x |
numeric vector, the ‘dependent’ variable to be smoothed. |
k |
integer width of median window; must be odd. Turlach had a
default of |
endrule |
character string indicating how the values at the beginning and the end (of the data) should be treated. Can be abbreviated. Possible values are:
|
algorithm |
character string (partially matching |
na.action |
character string determining the behavior in the case of
|
print.level |
integer, indicating verboseness of algorithm; should rarely be changed by average users. |
Details
Apart from the end values, the result y = runmed(x, k)
simply has
y[j] = median(x[(j-k2):(j+k2)])
(k = 2*k2+1
), computed very
efficiently.
The two algorithms are internally entirely different:
"Turlach"
is the Härdle–Steiger algorithm (see Ref.) as implemented by Berwin Turlach. A tree algorithm is used, ensuring performance
O(n \log k)
wheren = length(x)
which is asymptotically optimal."Stuetzle"
is the (older) Stuetzle–Friedman implementation which makes use of median updating when one observation enters and one leaves the smoothing window. While this performs as
O(n \times k)
which is slower asymptotically, it is considerably faster for smallk
orn
.
Note that, both algorithms (and the smoothEnds()
utility)
now “work” also when x
contains non-finite entries
(\pm
Inf
, NaN
, and
NA
):
"Turlach"
.......
"Stuetzle"
currently simply works by applying the underlying math library (‘libm’) arithmetic for the non-finite numbers; this may optionally change in the future.
Currently long vectors are only supported for algorithm = "Stuetzle"
.
Value
vector of smoothed values of the same length as x
with an
attr
ibute k
containing (the ‘oddified’)
k
.
Author(s)
Martin Maechler maechler@stat.math.ethz.ch, based on Fortran code from Werner Stuetzle and S-PLUS and C code from Berwin Turlach.
References
Härdle, W. and Steiger, W. (1995) Algorithm AS 296: Optimal median smoothing, Applied Statistics 44, 258–264. doi:10.2307/2986349.
Jerome H. Friedman and Werner Stuetzle (1982) Smoothing of Scatterplots; Report, Dep. Statistics, Stanford U., Project Orion 003.
See Also
smoothEnds
which implements Tukey's end point rule and
is called by default from runmed(*, endrule = "median")
.
smooth
uses running
medians of 3 for its compound smoothers.
Examples
require(graphics)
utils::example(nhtemp)
myNHT <- as.vector(nhtemp)
myNHT[20] <- 2 * nhtemp[20]
plot(myNHT, type = "b", ylim = c(48, 60), main = "Running Medians Example")
lines(runmed(myNHT, 7), col = "red")
## special: multiple y values for one x
plot(cars, main = "'cars' data and runmed(dist, 3)")
lines(cars, col = "light gray", type = "c")
with(cars, lines(speed, runmed(dist, k = 3), col = 2))
## nice quadratic with a few outliers
y <- ys <- (-20:20)^2
y [c(1,10,21,41)] <- c(150, 30, 400, 450)
all(y == runmed(y, 1)) # 1-neighbourhood <==> interpolation
plot(y) ## lines(y, lwd = .1, col = "light gray")
lines(lowess(seq(y), y, f = 0.3), col = "brown")
lines(runmed(y, 7), lwd = 2, col = "blue")
lines(runmed(y, 11), lwd = 2, col = "red")
## Lowess is not robust
y <- ys ; y[21] <- 6666 ; x <- seq(y)
col <- c("black", "brown","blue")
plot(y, col = col[1])
lines(lowess(x, y, f = 0.3), col = col[2])
lines(runmed(y, 7), lwd = 2, col = col[3])
legend(length(y),max(y), c("data", "lowess(y, f = 0.3)", "runmed(y, 7)"),
xjust = 1, col = col, lty = c(0, 1, 1), pch = c(1,NA,NA))
## An example with initial NA's - used to fail badly (notably for "Turlach"):
x15 <- c(rep(NA, 4), c(9, 9, 4, 22, 6, 1, 7, 5, 2, 8, 3))
rS15 <- cbind(Sk.3 = runmed(x15, k = 3, algorithm="S"),
Sk.7 = runmed(x15, k = 7, algorithm="S"),
Sk.11= runmed(x15, k =11, algorithm="S"))
rT15 <- cbind(Tk.3 = runmed(x15, k = 3, algorithm="T", print.level=1),
Tk.7 = runmed(x15, k = 7, algorithm="T", print.level=1),
Tk.9 = runmed(x15, k = 9, algorithm="T", print.level=1),
Tk.11= runmed(x15, k =11, algorithm="T", print.level=1))
cbind(x15, rS15, rT15) # result for k=11 maybe a bit surprising ..
Tv <- rT15[-(1:3),]
stopifnot(3 <= Tv, Tv <= 9, 5 <= Tv[1:10,])
matplot(y = cbind(x15, rT15), type = "b", ylim = c(1,9), pch=1:5, xlab = NA,
main = "runmed(x15, k, algo = \"Turlach\")")
mtext(paste("x15 <-", deparse(x15)))
points(x15, cex=2)
legend("bottomleft", legend=c("data", paste("k = ", c(3,7,9,11))),
bty="n", col=1:5, lty=1:5, pch=1:5)