group_id {timeplyr} | R Documentation |
Fast group IDs
Description
These are tidy-based functions for calculating group IDs, row IDs and
group orders.
-
group_id()
returns an integer vector of group IDs the same size as the data. -
row_id()
returns an integer vector of row IDs. -
group_order()
returns the order of the groups.
The add_
variants add a column of group IDs/row IDs/group orders.
Usage
group_id(
data,
...,
order = TRUE,
ascending = TRUE,
.by = NULL,
.cols = NULL,
as_qg = FALSE
)
add_group_id(
data,
...,
order = TRUE,
ascending = TRUE,
.by = NULL,
.cols = NULL,
.name = NULL,
as_qg = FALSE
)
row_id(data, ..., ascending = TRUE, .by = NULL, .cols = NULL)
## S3 method for class 'GRP'
row_id(data, ascending = TRUE, ...)
add_row_id(data, ..., ascending = TRUE, .by = NULL, .cols = NULL, .name = NULL)
group_order(data, ..., ascending = TRUE, .by = NULL, .cols = NULL)
add_group_order(
data,
...,
ascending = TRUE,
.by = NULL,
.cols = NULL,
.name = NULL
)
Arguments
data |
A data frame or vector. |
... |
Additional groups using tidy |
order |
Should the groups be ordered?
THE PHYSICAL ORDER OF THE DATA IS NOT CHANGED. identical(order(x, na.last = TRUE), order(group_id(x, order = TRUE))) or in the case of a data frame identical(order(x1, x2, x3, na.last = TRUE), order(group_id(data, x1, x2, x3, order = TRUE))) should always hold. |
ascending |
Should the group order be ascending or descending?
The default is |
.by |
Alternative way of supplying groups using |
.cols |
(Optional) alternative to |
as_qg |
Should the group IDs be returned as a
collapse "qG" class? The default ( |
.name |
Name of the added ID column which should be a
character vector of length 1.
If |
Details
It's important to note for data frames, these functions by default assume no groups unless you supply them.
This means that when no groups are supplied:
-
group_id(iris)
returns a vector of ones -
row_id(iris)
returns the plain row id numbers -
group_order(iris) == row_id(iris)
.
One can specify groups in the second argument like so:
-
group_id(iris, Species)
-
row_id(iris, across(all_of("Species")))
-
group_order(iris, across(where(is.numeric), desc))
If you want group_id
to always use all the columns of a data frame
for grouping
while simultaneously utilising the group_id
methods, one can use the below
function.
group_id2 <- function(data, ...){ group_id(data, ..., .cols = names(data)) }
Value
An integer vector.
Examples
library(timeplyr)
library(dplyr)
library(ggplot2)
group_id(iris) # No groups
group_id(iris, Species) # Species groups
row_id(iris) # Plain row IDs
row_id(iris, Species) # Row IDs by group
# Order of Species + descending Petal.Width
group_order(iris, Species, desc(Petal.Width))
# Same as
order(iris$Species, -xtfrm(iris$Petal.Width))
# Tidy data-masking/tidyselect can be used
group_id(iris, across(where(is.numeric))) # Groups across numeric values
# Alternatively using tidyselect
group_id(iris, .by = where(is.numeric))
# Group IDs using a mixtured order
group_id(iris, desc(Species), Sepal.Length, desc(Petal.Width))
# add_ helpers
iris %>%
distinct(Species) %>%
add_group_id(Species)
iris %>%
add_row_id(Species) %>%
pull(row_id)
# Usage in data.table
library(data.table)
iris_dt <- as.data.table(iris)
iris_dt[, group_id := group_id(.SD, .cols = names(.SD)),
.SDcols = "Species"]
# Or if you're using this often you can write a wrapper
set_add_group_id <- function(x, ..., .name = "group_id"){
id <- group_id(x, ...)
data.table::set(x, j = .name, value = id)
}
set_add_group_id(iris_dt, desc(Species))[]
mm_mpg <- mpg %>%
select(manufacturer, model) %>%
arrange(desc(pick(everything())))
# Sorted/non-sorted groups
mm_mpg %>%
add_group_id(across(everything()),
.name = "sorted_id", order = TRUE) %>%
add_group_id(manufacturer, model,
.name = "not_sorted_id", order = FALSE) %>%
distinct()