Winsorize {DescTools} | R Documentation |
Winsorize (Replace Extreme Values by Less Extreme Ones)
Description
Winsorizing a vector means that a predefined quantum of the smallest and/or the largest values are replaced by less extreme values. Thereby the substitute values are the most extreme retained values.
Usage
Winsorize(x, val = quantile(x, probs = c(0.05, 0.95), na.rm = FALSE))
Arguments
x |
a numeric vector to be winsorized. |
val |
the low border, all values being lower than this will be replaced by this value. The default is set to the 5%-quantile of x. |
Details
The winsorized vector is obtained by
g(x) =
\left\{\begin{array}{ll}
-c &\textup{for } x \le c\\
x &\textup{for } |x| < c\\
c &\textup{for } x \ge c
\end{array}\right.
You may also want to consider standardizing (possibly robustly) the data before you perform a winsorization.
Value
A vector of the same length as the original data x
containing
the winsorized data.
Author(s)
Andri Signorell andri@signorell.net
See Also
winsorize
from the package robustHD
contains
an option to winsorize multivariate data
Examples
library(DescTools)
## generate data
set.seed(9128)
x <- round(runif(100) * 100, 1)
(d.frm <- DescTools::Sort(data.frame(
x,
default = Winsorize(x),
quantile = Winsorize(x, quantile(x, probs=c(0.1, 0.8), na.rm = FALSE)),
fixed_val = Winsorize(x, val=c(15, 85)),
fixed_n = Winsorize(x, val=c(Small(x, k=3)[3], Large(x, k=3)[1])),
closest = Winsorize(x, val=unlist(Closest(x, c(30, 70))))
)))[c(1:10, 90:100), ]
# use Large and Small, if a fix number of values should be winsorized (here k=3)
PlotLinesA(SetNames(d.frm, rownames=NULL), lwd=2, col=Pal("Tibco"),
main="Winsorized Vector")
z <- 0:10
# twosided (default):
Winsorize(z, val=c(2,8))
# onesided:
# ... replace all values > 8 with 8
Winsorize(z, val=c(min(z), 8))
# ... replace all values < 4 with 4
Winsorize(z, val=c(4, max(z)))