group_id {timeplyr}R Documentation

Fast group IDs

Description

These are tidy-based functions for calculating group IDs, row IDs and group orders.

The add_ variants add a column of group IDs/row IDs/group orders.

Usage

group_id(
  data,
  ...,
  order = TRUE,
  ascending = TRUE,
  .by = NULL,
  .cols = NULL,
  as_qg = FALSE
)

add_group_id(
  data,
  ...,
  order = TRUE,
  ascending = TRUE,
  .by = NULL,
  .cols = NULL,
  .name = NULL,
  as_qg = FALSE
)

row_id(data, ..., ascending = TRUE, .by = NULL, .cols = NULL)

## S3 method for class 'GRP'
row_id(data, ascending = TRUE, ...)

add_row_id(data, ..., ascending = TRUE, .by = NULL, .cols = NULL, .name = NULL)

group_order(data, ..., ascending = TRUE, .by = NULL, .cols = NULL)

add_group_order(
  data,
  ...,
  ascending = TRUE,
  .by = NULL,
  .cols = NULL,
  .name = NULL
)

Arguments

data

A data frame or vector.

...

Additional groups using tidy data-masking rules.
To specify groups using tidyselect, simply use the .by argument.

order

Should the groups be ordered? THE PHYSICAL ORDER OF THE DATA IS NOT CHANGED.
When order is TRUE (the default) the group IDs will be ordered but not sorted.
The expression

identical(order(x, na.last = TRUE),
          order(group_id(x, order = TRUE)))

or in the case of a data frame

identical(order(x1, x2, x3, na.last = TRUE),
          order(group_id(data, x1, x2, x3, order = TRUE)))

should always hold.
If FALSE the order of the group IDs will be based on first appearance.

ascending

Should the group order be ascending or descending? The default is TRUE.
For row_id() this determines if the row IDs are increasing or decreasing.
NOTE - When order = FALSE, the ascending argument is ignored. This is something that will be fixed in a later version.

.by

Alternative way of supplying groups using tidyselect notation.

.cols

(Optional) alternative to ... that accepts a named character vector or numeric vector. If speed is an expensive resource, it is recommended to use this.

as_qg

Should the group IDs be returned as a collapse "qG" class? The default (FALSE) always returns an integer vector.

.name

Name of the added ID column which should be a character vector of length 1. If .name = NULL (the default), add_group_id() will add a column named "group_id", and if one already exists, a unique name will be used.

Details

It's important to note for data frames, these functions by default assume no groups unless you supply them.

This means that when no groups are supplied:

One can specify groups in the second argument like so:

If you want group_id to always use all the columns of a data frame for grouping while simultaneously utilising the group_id methods, one can use the below function.

group_id2 <- function(data, ...){
 group_id(data, ..., .cols = names(data))
}

Value

An integer vector.

Examples

library(timeplyr)
library(dplyr)
library(ggplot2)

group_id(iris) # No groups
group_id(iris, Species) # Species groups
row_id(iris) # Plain row IDs
row_id(iris, Species) # Row IDs by group
# Order of Species + descending Petal.Width
group_order(iris, Species, desc(Petal.Width))
# Same as
order(iris$Species, -xtfrm(iris$Petal.Width))

# Tidy data-masking/tidyselect can be used
group_id(iris, across(where(is.numeric))) # Groups across numeric values
# Alternatively using tidyselect
group_id(iris, .by = where(is.numeric))

# Group IDs using a mixtured order
group_id(iris, desc(Species), Sepal.Length, desc(Petal.Width))

# add_ helpers
iris %>%
  distinct(Species) %>%
  add_group_id(Species)
iris %>%
  add_row_id(Species) %>%
  pull(row_id)

# Usage in data.table
library(data.table)
iris_dt <- as.data.table(iris)
iris_dt[, group_id := group_id(.SD, .cols = names(.SD)),
        .SDcols = "Species"]

# Or if you're using this often you can write a wrapper
set_add_group_id <- function(x, ..., .name = "group_id"){
  id <- group_id(x, ...)
  data.table::set(x, j = .name, value = id)
}
set_add_group_id(iris_dt, desc(Species))[]

mm_mpg <- mpg %>%
  select(manufacturer, model) %>%
  arrange(desc(pick(everything())))

# Sorted/non-sorted groups
mm_mpg %>%
  add_group_id(across(everything()),
               .name = "sorted_id", order = TRUE) %>%
  add_group_id(manufacturer, model,
               .name = "not_sorted_id", order = FALSE) %>%
  distinct()


[Package timeplyr version 0.8.1 Index]