| row_number {dplyr} | R Documentation |
Integer ranking functions
Description
Three ranking functions inspired by SQL2003. They differ primarily in how they handle ties:
-
row_number()gives every input a unique rank, so thatc(10, 20, 20, 30)would get ranksc(1, 2, 3, 4). It's equivalent torank(ties.method = "first"). -
min_rank()gives every tie the same (smallest) value so thatc(10, 20, 20, 30)gets ranksc(1, 2, 2, 4). It's the way that ranks are usually computed in sports and is equivalent torank(ties.method = "min"). -
dense_rank()works likemin_rank(), but doesn't leave any gaps, so thatc(10, 20, 20, 30)gets ranksc(1, 2, 2, 3).
Usage
row_number(x)
min_rank(x)
dense_rank(x)
Arguments
x |
A vector to rank By default, the smallest values will get the smallest ranks. Use Missing values will be given rank To rank by multiple columns at once, supply a data frame. |
Value
An integer vector.
See Also
Other ranking functions:
ntile(),
percent_rank()
Examples
x <- c(5, 1, 3, 2, 2, NA)
row_number(x)
min_rank(x)
dense_rank(x)
# Ranking functions can be used in `filter()` to select top/bottom rows
df <- data.frame(
grp = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
x = c(3, 2, 1, 1, 2, 2, 1, 1, 1),
y = c(1, 3, 2, 3, 2, 2, 4, 1, 2),
id = 1:9
)
# Always gives exactly 1 row per group
df %>% group_by(grp) %>% filter(row_number(x) == 1)
# May give more than 1 row if ties
df %>% group_by(grp) %>% filter(min_rank(x) == 1)
# Rank by multiple columns (to break ties) by selecting them with `pick()`
df %>% group_by(grp) %>% filter(min_rank(pick(x, y)) == 1)
# See slice_min() and slice_max() for another way to tackle the same problem
# You can use row_number() without an argument to refer to the "current"
# row number.
df %>% group_by(grp) %>% filter(row_number() == 1)
# It's easiest to see what this does with mutate():
df %>% group_by(grp) %>% mutate(grp_id = row_number())