split_var {sjmisc}R Documentation

Split numeric variables into smaller groups

Description

Recode numeric variables into equal sized groups, i.e. a variable is cut into a smaller number of groups at specific cut points. split_var_if() is a scoped variant of split_var(), where transformation will be applied only to those variables that match the logical condition of predicate.

Usage

split_var(
  x,
  ...,
  n,
  as.num = FALSE,
  val.labels = NULL,
  var.label = NULL,
  inclusive = FALSE,
  append = TRUE,
  suffix = "_g"
)

split_var_if(
  x,
  predicate,
  n,
  as.num = FALSE,
  val.labels = NULL,
  var.label = NULL,
  inclusive = FALSE,
  append = TRUE,
  suffix = "_g"
)

Arguments

x

A vector or data frame.

...

Optional, unquoted names of variables that should be selected for further processing. Required, if x is a data frame (and no vector) and only selected variables from x should be processed. You may also use functions like : or tidyselect's select-helpers. See 'Examples' or package-vignette.

n

The new number of groups that x should be split into.

as.num

Logical, if TRUE, return value will be numeric, not a factor.

val.labels

Optional character vector, to set value label attributes of recoded variable (see vignette Labelled Data and the sjlabelled-Package). If NULL (default), no value labels will be set. Value labels can also be directly defined in the rec-syntax, see 'Details'.

var.label

Optional string, to set variable label attribute for the returned variable (see vignette Labelled Data and the sjlabelled-Package). If NULL (default), variable label attribute of x will be used (if present). If empty, variable label attributes will be removed.

inclusive

Logical; if TRUE, cut point value are included in the preceding group. This may be necessary if cutting a vector into groups does not define proper ("equal sized") group sizes. See 'Note' and 'Examples'.

append

Logical, if TRUE (the default) and x is a data frame, x including the new variables as additional columns is returned; if FALSE, only the new variables are returned.

suffix

Indicates which suffix will be added to each dummy variable. Use "numeric" to number dummy variables, e.g. x_1, x_2, x_3 etc. Use "label" to add value label, e.g. x_low, x_mid, x_high. May be abbreviated.

predicate

A predicate function to be applied to the columns. The variables for which predicate returns TRUE are selected.

Details

split_var() splits a variable into equal sized groups, where the amount of groups depends on the n-argument. Thus, this functions cuts a variable into groups at the specified quantiles.

By contrast, group_var recodes a variable into groups, where groups have the same value range (e.g., from 1-5, 6-10, 11-15 etc.).

split_var() also works on grouped data frames (see group_by). In this case, splitting is applied to the subsets of variables in x. See 'Examples'.

Value

A grouped variable with equal sized groups. If x is a data frame, for append = TRUE, x including the grouped variables as new columns is returned; if append = FALSE, only the grouped variables will be returned. If append = TRUE and suffix = "", recoded variables will replace (overwrite) existing variables.

Note

In case a vector has only few number of unique values, splitting into equal sized groups may fail. In this case, use the inclusive-argument to shift a value at the cut point into the lower, preceeding group to get equal sized groups. See 'Examples'.

See Also

group_var to group variables into equal ranged groups, or rec to recode variables.

Examples

data(efc)
# non-grouped
table(efc$neg_c_7)

# split into 3 groups
table(split_var(efc$neg_c_7, n = 3))

# split multiple variables into 3 groups
split_var(efc, neg_c_7, pos_v_4, e17age, n = 3, append = FALSE)
frq(split_var(efc, neg_c_7, pos_v_4, e17age, n = 3, append = FALSE))

# original
table(efc$e42dep)

# two groups, non-inclusive cut-point
# vector split leads to unequal group sizes
table(split_var(efc$e42dep, n = 2))

# two groups, inclusive cut-point
# group sizes are equal
table(split_var(efc$e42dep, n = 2, inclusive = TRUE))

# Unlike dplyr's ntile(), split_var() never splits a value
# into two different categories, i.e. you always get a clean
# separation of original categories
library(dplyr)

x <- dplyr::ntile(efc$neg_c_7, n = 3)
table(efc$neg_c_7, x)

x <- split_var(efc$neg_c_7, n = 3)
table(efc$neg_c_7, x)

# works also with gouped data frames
mtcars %>%
  split_var(disp, n = 3, append = FALSE) %>%
  table()

mtcars %>%
  group_by(cyl) %>%
  split_var(disp, n = 3, append = FALSE) %>%
  table()

[Package sjmisc version 2.8.10 Index]