group_var {sjmisc} | R Documentation |
Recode numeric variables into equal-ranged groups
Description
Recode numeric variables into equal ranged, grouped factors,
i.e. a variable is cut into a smaller number of groups, where each group
has the same value range. group_labels()
creates the related value
labels. group_var_if()
and group_labels_if()
are scoped
variants of group_var()
and group_labels()
, where grouping
will be applied only to those variables that match the logical condition
of predicate
.
Usage
group_var(
x,
...,
size = 5,
as.num = TRUE,
right.interval = FALSE,
n = 30,
append = TRUE,
suffix = "_gr"
)
group_var_if(
x,
predicate,
size = 5,
as.num = TRUE,
right.interval = FALSE,
n = 30,
append = TRUE,
suffix = "_gr"
)
group_labels(x, ..., size = 5, right.interval = FALSE, n = 30)
group_labels_if(x, predicate, size = 5, right.interval = FALSE, n = 30)
Arguments
x |
A vector or data frame. |
... |
Optional, unquoted names of variables that should be selected for
further processing. Required, if |
size |
Numeric; group-size, i.e. the range for grouping. By default,
for each 5 categories of |
as.num |
Logical, if |
right.interval |
Logical; if |
n |
Sets the maximum number of groups that are defined when auto-grouping is on
( |
append |
Logical, if |
suffix |
Indicates which suffix will be added to each dummy variable.
Use |
predicate |
A predicate function to be applied to the columns. The
variables for which |
Details
If size
is set to a specific value, the variable is recoded
into several groups, where each group has a maximum range of size
.
Hence, the amount of groups differ depending on the range of x
.
If size = "auto"
, the variable is recoded into a maximum of
n
groups. Hence, independent from the range of
x
, always the same amount of groups are created, so the range
within each group differs (depending on x
's range).
right.interval
determins which boundary values to include when
grouping is done. If TRUE
, grouping starts with the lower
bound of size
. For example, having a variable ranging from
50 to 80, groups cover the ranges from 50-54, 55-59, 60-64 etc.
If FALSE
(default), grouping starts with the upper bound
of size
. In this case, groups cover the ranges from
46-50, 51-55, 56-60, 61-65 etc. Note: This will cover
a range from 46-50 as first group, even if values from 46 to 49
are not present. See 'Examples'.
If you want to split a variable into a certain amount of equal
sized groups (instead of having groups where values have all the same
range), use the split_var
function!
group_var()
also works on grouped data frames (see group_by
).
In this case, grouping is applied to the subsets of variables
in x
. See 'Examples'.
Value
For
group_var()
, a grouped variable, either as numeric or as factor (see paramteras.num
). Ifx
is a data frame, only the grouped variables will be returned.For
group_labels()
, a string vector or a list of string vectors containing labels based on the grouped categories ofx
, formatted as "from lower bound to upper bound", e.g."10-19" "20-29" "30-39"
etc. See 'Examples'.
Note
Variable label attributes (see, for instance,
set_label
) are preserved. Usually you should use
the same values for size
and right.interval
in
group_labels()
as used in the group_var
function if you want
matching labels for the related recoded variable.
See Also
split_var
to split variables into equal sized groups,
group_str
for grouping string vectors or
rec_pattern
and rec
for another convenient
way of recoding variables into smaller groups.
Examples
age <- abs(round(rnorm(100, 65, 20)))
age.grp <- group_var(age, size = 10)
hist(age)
hist(age.grp)
age.grpvar <- group_labels(age, size = 10)
table(age.grp)
print(age.grpvar)
# histogram with EUROFAMCARE sample dataset
# variable not grouped
library(sjlabelled)
data(efc)
hist(efc$e17age, main = get_label(efc$e17age))
# bar plot with EUROFAMCARE sample dataset
# grouped variable
ageGrp <- group_var(efc$e17age)
ageGrpLab <- group_labels(efc$e17age)
barplot(table(ageGrp), main = get_label(efc$e17age), names.arg = ageGrpLab)
# within a pipe-chain
library(dplyr)
efc %>%
select(e17age, c12hour, c160age) %>%
group_var(size = 20)
# create vector with values from 50 to 80
dummy <- round(runif(200, 50, 80))
# labels with grouping starting at lower bound
group_labels(dummy)
# labels with grouping startint at upper bound
group_labels(dummy, right.interval = TRUE)
# works also with gouped data frames
mtcars %>%
group_var(disp, size = 4, append = FALSE) %>%
table()
mtcars %>%
group_by(cyl) %>%
group_var(disp, size = 4, append = FALSE) %>%
table()