winsorize {datawizard} | R Documentation |
Winsorize data
Description
Winsorize data
Usage
winsorize(data, ...)
## S3 method for class 'numeric'
winsorize(
data,
threshold = 0.2,
method = "percentile",
robust = FALSE,
verbose = TRUE,
...
)
Arguments
data |
data frame or vector. |
... |
Currently not used. |
threshold |
The amount of winsorization, depends on the value of
|
method |
One of "percentile" (default), "zscore", or "raw". |
robust |
Logical, if TRUE, winsorizing through the "zscore" method is done via the median and the median absolute deviation (MAD); if FALSE, via the mean and the standard deviation. |
verbose |
Not used anymore since |
Details
Winsorizing or winsorization is the transformation of statistics by limiting
extreme values in the statistical data to reduce the effect of possibly
spurious outliers. The distribution of many statistics can be heavily
influenced by outliers. A typical strategy is to set all outliers (values
beyond a certain threshold) to a specified percentile of the data; for
example, a 90%
winsorization would see all data below the 5th percentile set
to the 5th percentile, and data above the 95th percentile set to the 95th
percentile. Winsorized estimators are usually more robust to outliers than
their more standard forms.
Value
A data frame with winsorized columns or a winsorized vector.
See Also
Functions to rename stuff:
data_rename()
,data_rename_rows()
,data_addprefix()
,data_addsuffix()
Functions to reorder or remove columns:
data_reorder()
,data_relocate()
,data_remove()
Functions to reshape, pivot or rotate data frames:
data_to_long()
,data_to_wide()
,data_rotate()
Functions to recode data:
rescale()
,reverse()
,categorize()
,recode_values()
,slide()
Functions to standardize, normalize, rank-transform:
center()
,standardize()
,normalize()
,ranktransform()
,winsorize()
Split and merge data frames:
data_partition()
,data_merge()
Functions to find or select columns:
data_select()
,extract_column_names()
Functions to filter rows:
data_match()
,data_filter()
Examples
hist(iris$Sepal.Length, main = "Original data")
hist(winsorize(iris$Sepal.Length, threshold = 0.2),
xlim = c(4, 8), main = "Percentile Winsorization"
)
hist(winsorize(iris$Sepal.Length, threshold = 1.5, method = "zscore"),
xlim = c(4, 8), main = "Mean (+/- SD) Winsorization"
)
hist(winsorize(iris$Sepal.Length, threshold = 1.5, method = "zscore", robust = TRUE),
xlim = c(4, 8), main = "Median (+/- MAD) Winsorization"
)
hist(winsorize(iris$Sepal.Length, threshold = c(5, 7.5), method = "raw"),
xlim = c(4, 8), main = "Raw Thresholds"
)
# Also works on a data frame:
winsorize(iris, threshold = 0.2)