R: Winsorize Numeric Data

winsors {quest}

R Documentation

Winsorize Numeric Data

Description

winsors winsorizes numeric data by recoding extreme values as a user identified boundary value, which is defined by z-score units. The to.na argument provides the option of recoding the extreme values as missing.

Usage

winsors(
  data,
  vrb.nm,
  z.min = -3,
  z.max = 3,
  rtn.int = FALSE,
  to.na = FALSE,
  suffix = "_win"
)

Arguments

`data`	data.frame of data.
`vrb.nm`	character vector of colnames from `data` specifying the variables.
`z.min`	numeric vector of length 1 specifying the lower boundary value in z-score units.
`z.max`	numeric vector of length 1 specifying the upper boundary value in z-score units.
`rtn.int`	logical vector of length 1 specifying whether the recoded values should be rounded to the nearest integer. This can be useful when working with count data and decimal values are impossible.
`to.na`	logical vector of length 1 specifying whether the extreme values should be recoded to NA rather than winsorized to the boundary values.
`suffix`	character vector of length 1 specifying the string to append to the end of the colnames in the return object.

Value

data.frame of winsorized data with extreme values recoded as either the boundary values or NA and colnames = paste0(vrb.nm, suffix).

Examples


# winsorize
lapply(X = quakes[c("mag","stations")], FUN = table)
new <- winsors(quakes, vrb.nm = names(quakes))
lapply(X = new, FUN = table)

# recode as NA
vecNA(quakes)
new <- winsors(quakes, vrb.nm = names(quakes), to.na = TRUE)
vecNA(new)

# rtn.int = TRUE
winsors(data = cars, vrb.nm = names(cars), z.min = -2, z.max = 2, rtn.int = FALSE)
winsors(data = cars, vrb.nm = names(cars), z.min = -2, z.max = 2, rtn.int = TRUE)