percent_rank {dplyr}R Documentation

Proportional ranking functions

Description

These two ranking functions implement two slightly different ways to compute a percentile. For each x_i in x:

In both cases, missing values are ignored when counting the number of observations.

Usage

percent_rank(x)

cume_dist(x)

Arguments

x

A vector to rank

By default, the smallest values will get the smallest ranks. Use desc() to reverse the direction so the largest values get the smallest ranks.

Missing values will be given rank NA. Use coalesce(x, Inf) or coalesce(x, -Inf) if you want to treat them as the largest or smallest values respectively.

To rank by multiple columns at once, supply a data frame.

Value

A numeric vector containing a proportion.

See Also

Other ranking functions: ntile(), row_number()

Examples

x <- c(5, 1, 3, 2, 2)

cume_dist(x)
percent_rank(x)

# You can understand what's going on by computing it by hand
sapply(x, function(xi) sum(x <= xi) / length(x))
sapply(x, function(xi) sum(x < xi)  / (length(x) - 1))
# The real computations are a little more complex in order to
# correctly deal with missing values

[Package dplyr version 1.1.4 Index]