recode-replace {collapse} | R Documentation |
Recode and Replace Values in Matrix-Like Objects
Description
A small suite of functions to efficiently perform common recoding and replacing tasks in matrix-like objects.
Usage
recode_num(X, ..., default = NULL, missing = NULL, set = FALSE)
recode_char(X, ..., default = NULL, missing = NULL, regex = FALSE,
ignore.case = FALSE, fixed = FALSE, set = FALSE)
replace_na(X, value = 0, cols = NULL, set = FALSE, type = "const")
replace_inf(X, value = NA, replace.nan = FALSE, set = FALSE)
replace_outliers(X, limits, value = NA,
single.limit = c("sd", "mad", "min", "max"),
ignore.groups = FALSE, set = FALSE)
Arguments
X |
a vector, matrix, array, data frame or list of atomic objects. |
... |
comma-separated recode arguments of the form: |
default |
optional argument to specify a scalar value to replace non-matched elements with. |
missing |
optional argument to specify a scalar value to replace missing elements with. Note that to increase efficiency this is done before the rest of the recoding i.e. the recoding is performed on data where missing values are filled! |
set |
logical. |
type |
character. One of |
regex |
logical. If |
value |
a single (scalar) value to replace matching elements with. In |
cols |
select columns to replace missing values in using a function, column names, indices or a logical vector. |
replace.nan |
logical. |
limits |
either a vector of two-numeric values |
single.limit |
character, controls the behavior if
|
ignore.groups |
logical. If |
ignore.case , fixed |
logical. Passed to |
Details
-
recode_num
andrecode_char
can be used to efficiently recode multiple numeric or character values, respectively. The syntax is inspired bydplyr::recode
, but the functionality is enhanced in the following respects: (1) when passed a data frame / list, all appropriately typed columns will be recoded. (2) They preserve the attributes of the data object and of columns in a data frame / list, and (3)recode_char
also supports regular expression matching usinggrepl
. -
replace_na
efficiently replacesNA/NaN
with a value (default is0
). data can be multi-typed, in which case appropriate columns can be selected through thecols
argument. For numeric data a more versatile alternative is provided bydata.table::nafill
anddata.table::setnafill
. -
replace_inf
replacesInf/-Inf
(or optionallyNaN/Inf/-Inf
) with a value (default isNA
). It skips non-numeric columns in a data frame. -
replace_outliers
replaces values falling outside a 1- or 2-sided numeric threshold or outside a certain number of standard deviations or median absolute deviation with a value (default isNA
). It skips non-numeric columns in a data frame.
Note
These functions are not generic and do not offer support for factors or date(-time) objects. see dplyr::recode_factor
, forcats and other appropriate packages for dealing with these classes.
Simple replacing tasks on a vector can also effectively be handled by, setv
/ copyv
. Fast vectorized switches are offered by package kit (functions iif
, nif
, vswitch
, nswitch
) as well as data.table::fcase
and data.table::fifelse
. Using switches is more efficient than recode_*
, as recode_*
creates an internal copy of the object to enable cross-replacing.
Function TRA
, and the associated TRA
('transform') argument to Fast Statistical Functions also has option "replace_na"
, to replace missing values with a statistic computed on the non-missing observations, e.g. fmedian(airquality, TRA = "replace_na")
does median imputation.
See Also
pad
, Efficient Programming, Collapse Overview
Examples
recode_char(c("a","b","c"), a = "b", b = "c")
recode_char(month.name, ber = NA, regex = TRUE)
mtcr <- recode_num(mtcars, `0` = 2, `4` = Inf, `1` = NaN)
replace_inf(mtcr)
replace_inf(mtcr, replace.nan = TRUE)
replace_outliers(mtcars, c(2, 100)) # Replace all values below 2 and above 100 w. NA
replace_outliers(mtcars, c(2, 100), value = "clip") # Clipping outliers to the thresholds
replace_outliers(mtcars, 2, single.limit = "min") # Replace all value smaller than 2 with NA
replace_outliers(mtcars, 100, single.limit = "max") # Replace all value larger than 100 with NA
replace_outliers(mtcars, 2) # Replace all values above or below 2 column-
# standard-deviations from the column-mean w. NA
replace_outliers(fgroup_by(iris, Species), 2) # Passing a grouped_df, pseries or pdata.frame
# allows to remove outliers according to
# in-group standard-deviation. see ?fscale