rec {sjmisc} | R Documentation |
Recode variables
Description
rec()
recodes values of variables, where variable
selection is based on variable names or column position, or on
select helpers (see documentation on ...
). rec_if()
is a
scoped variant of rec()
, where recoding will be applied only
to those variables that match the logical condition of predicate
.
Usage
rec(
x,
...,
rec,
as.num = TRUE,
var.label = NULL,
val.labels = NULL,
append = TRUE,
suffix = "_r",
to.factor = !as.num
)
rec_if(
x,
predicate,
rec,
as.num = TRUE,
var.label = NULL,
val.labels = NULL,
append = TRUE,
suffix = "_r",
to.factor = !as.num
)
Arguments
x |
A vector or data frame. |
... |
Optional, unquoted names of variables that should be selected for
further processing. Required, if |
rec |
String with recode pairs of old and new values. See 'Details'
for examples. |
as.num |
Logical, if |
var.label |
Optional string, to set variable label attribute for the
returned variable (see vignette Labelled Data and the sjlabelled-Package).
If |
val.labels |
Optional character vector, to set value label attributes
of recoded variable (see vignette Labelled Data and the sjlabelled-Package).
If |
append |
Logical, if |
suffix |
String value, will be appended to variable (column) names of
If |
to.factor |
Logical, alias for |
predicate |
A predicate function to be applied to the columns. The
variables for which |
Details
The rec
string has following syntax:
- recode pairs
each recode pair has to be separated by a
;
, e.g.rec = "1=1; 2=4; 3=2; 4=3"
- multiple values
multiple old values that should be recoded into a new single value may be separated with comma, e.g.
"1,2=1; 3,4=2"
- value range
a value range is indicated by a colon, e.g.
"1:4=1; 5:8=2"
(recodes all values from 1 to 4 into 1, and from 5 to 8 into 2)- value range for doubles
for double vectors (with fractional part), all values within the specified range are recoded; e.g.
1:2.5=1;2.6:3=2
recodes 1 to 2.5 into 1 and 2.6 to 3 into 2, but 2.55 would not be recoded (since it's not included in any of the specified ranges)"min"
and"max"
minimum and maximum values are indicates by min (or lo) and max (or hi), e.g.
"min:4=1; 5:max=2"
(recodes all values from minimum values ofx
to 4 into 1, and from 5 to maximum values ofx
into 2)"else"
all other values, which have not been specified yet, are indicated by else, e.g.
"3=1; 1=2; else=3"
(recodes 3 into 1, 1 into 2 and all other values into 3)"copy"
the
"else"
-token can be combined with copy, indicating that all remaining, not yet recoded values should stay the same (are copied from the original value), e.g."3=1; 1=2; else=copy"
(recodes 3 into 1, 1 into 2 and all other values like 2, 4 or 5 etc. will not be recoded, but copied, see 'Examples')NA
'sNA
values are allowed both as old and new value, e.g."NA=1; 3:5=NA"
(recodes all NA into 1, and all values from 3 to 5 into NA in the new variable)"rev"
"rev"
is a special token that reverses the value order (see 'Examples')- direct value labelling
value labels for new values can be assigned inside the recode pattern by writing the value label in square brackets after defining the new value in a recode pair, e.g.
"15:30=1 [young aged]; 31:55=2 [middle aged]; 56:max=3 [old aged]"
. See 'Examples'.
Value
x
with recoded categories. If x
is a data frame,
for append = TRUE
, x
including the recoded variables
as new columns is returned; if append = FALSE
, only
the recoded variables will be returned. If append = TRUE
and
suffix = ""
, recoded variables will replace (overwrite) existing
variables.
Note
Please note following behaviours of the function:
the
"else"
-token should always be the last argument in therec
-string.Non-matching values will be set to
NA
, unless captured by the"else"
-token.Tagged NA values (see
tagged_na
) and their value labels will be preserved when copying NA values to the recoded vector with"else=copy"
.Variable label attributes (see, for instance,
get_label
) are preserved (unless changed viavar.label
-argument), however, value label attributes are removed (except for"rev"
, where present value labels will be automatically reversed as well). Useval.labels
-argument to add labels for recoded values.If
x
is a data frame, all variables should have the same categories resp. value range (else, see second bullet,NA
s are produced).
See Also
set_na
for setting NA
values, replace_na
to replace NA
's with specific value, recode_to
for re-shifting value ranges and ref_lvl
to change the
reference level of (numeric) factors.
Examples
data(efc)
table(efc$e42dep, useNA = "always")
# replace NA with 5
table(rec(efc$e42dep, rec = "1=1;2=2;3=3;4=4;NA=5"), useNA = "always")
# recode 1 to 2 into 1 and 3 to 4 into 2
table(rec(efc$e42dep, rec = "1,2=1; 3,4=2"), useNA = "always")
# keep value labels. variable label is automatically preserved
library(dplyr)
efc %>%
select(e42dep) %>%
rec(rec = "1,2=1; 3,4=2",
val.labels = c("low dependency", "high dependency")) %>%
frq()
# works with mutate
efc %>%
select(e42dep, e17age) %>%
mutate(dependency_rev = rec(e42dep, rec = "rev")) %>%
head()
# recode 1 to 3 into 1 and 4 into 2
table(rec(efc$e42dep, rec = "min:3=1; 4=2"), useNA = "always")
# recode 2 to 1 and all others into 2
table(rec(efc$e42dep, rec = "2=1; else=2"), useNA = "always")
# reverse value order
table(rec(efc$e42dep, rec = "rev"), useNA = "always")
# recode only selected values, copy remaining
table(efc$e15relat)
table(rec(efc$e15relat, rec = "1,2,4=1; else=copy"))
# recode variables with same category in a data frame
head(efc[, 6:9])
head(rec(efc[, 6:9], rec = "1=10;2=20;3=30;4=40"))
# recode multiple variables and set value labels via recode-syntax
dummy <- rec(
efc, c160age, e17age,
rec = "15:30=1 [young]; 31:55=2 [middle]; 56:max=3 [old]",
append = FALSE
)
frq(dummy)
# recode variables with same value-range
lapply(
rec(
efc, c82cop1, c83cop2, c84cop3,
rec = "1,2=1; NA=9; else=copy",
append = FALSE
),
table,
useNA = "always"
)
# recode character vector
dummy <- c("M", "F", "F", "X")
rec(dummy, rec = "M=Male; F=Female; X=Refused")
# recode numeric to character
rec(efc$e42dep, rec = "1=first;2=2nd;3=third;else=hi") %>% head()
# recode non-numeric factors
data(iris)
table(rec(iris, Species, rec = "setosa=huhu; else=copy", append = FALSE))
# recode floating points
table(rec(
iris, Sepal.Length, rec = "lo:5=1;5.01:6.5=2;6.501:max=3", append = FALSE
))
# preserve tagged NAs
if (require("haven")) {
x <- labelled(c(1:3, tagged_na("a", "c", "z"), 4:1),
c("Agreement" = 1, "Disagreement" = 4, "First" = tagged_na("c"),
"Refused" = tagged_na("a"), "Not home" = tagged_na("z")))
# get current value labels
x
# recode 2 into 5; Values of tagged NAs are preserved
rec(x, rec = "2=5;else=copy")
}
# use select-helpers from dplyr-package
out <- rec(
efc, contains("cop"), c161sex:c175empl,
rec = "0,1=0; else=1",
append = FALSE
)
head(out)
# recode only variables that have a value range from 1-4
p <- function(x) min(x, na.rm = TRUE) > 0 && max(x, na.rm = TRUE) < 5
out <- rec_if(efc, predicate = p, rec = "1:3=1;4=2;else=copy")
head(out)