cut,NumericVariable-method {crunch}R Documentation

Cut a numeric Crunch variable

Description

crunch::cut() is equivalent to base::cut() except that it operates on Crunch variables instead of in-memory R objects. The function takes a numeric variable and derives a new categorical variable from it based on the breaks argument. You can either break the variable into evenly spaced categories by specifying the number of breaks, or specify a numeric vector identifying the start and end point of each category. For example, specifying breaks = 5 will break the numeric data into five evenly spaced portions while breaks = c(1, 5, 10) will recode the data into two groups based on whether the numeric vector falls between 1 and 5 or 5 and 10.

Usage

## S4 method for signature 'NumericVariable'
cut(
  x,
  breaks,
  labels = NULL,
  name,
  include.lowest = FALSE,
  right = TRUE,
  dig.lab = 3,
  ordered_result = FALSE,
  ...
)

Arguments

x

A Crunch NumericVariable

breaks

Either a numeric vector of two or more unique cut points or a single number giving the number of intervals into which x is to be cut. If specifying cut points, values that are less than the smallest value in breaks or greater than the largest value in breaks will be marked missing in the resulting categorical variable.

labels

A character vector representing the labels for the levels of the resulting categories. The length of the labels argument should be the same as the number of categories, which is one fewer than the number of breaks. If not specified, labels are constructed using interval notation. For example, ⁠[1, 5)⁠ indicates that the category goes from 1 to 5. The bracket shape indicates whether the boundary value is included in the category, i.e. whether it is "closed". ⁠[1, 5)⁠ indicates that the interval includes (is closed on) 1 but does not include (is open on) 5. If labels = FALSE, simple integer codes are returned instead of a factor.

name

The name of the resulting Crunch variable as a character string.

include.lowest

logical, indicating if an x[i] equal to the lowest (or highest, for right = FALSE) breaks value should be included.

right

logical, indicating if the intervals should be closed on the right (and open on the left) or vice versa.

dig.lab

integer which is used when labels are not given. It determines the number of digits used in formatting the break numbers.

ordered_result

Ignored.

...

further arguments passed to makeCaseVariable

Value

a Crunch VariableDefinition. Assign it into the dataset to create it as a derived variable on the server.

Examples

## Not run: 
ds <- loadDataset("mtcars")
ds$cat_var <- cut(ds$mpg,
    breaks = c(10, 15, 20),
    labels = c("small", "medium"), name = "Fuel efficiency"
)
ds$age <- sample(1:100, 32)
ds$age4 <- cut(df$age, c(0, 30, 45, 65, 200),
    c("Youth", "Adult", "Middle-aged", "Elderly"),
    name = "Age (4 category)"
)

## End(Not run)

[Package crunch version 1.30.4 Index]