percent_rank {dplyr} | R Documentation |
These two ranking functions implement two slightly different ways to
compute a percentile. For each x_i
in x
:
cume_dist(x)
counts the total number of values less than
or equal to x_i
, and divides it by the number of observations.
percent_rank(x)
counts the total number of values less than
x_i
, and divides it by the number of observations minus 1.
In both cases, missing values are ignored when counting the number of observations.
percent_rank(x)
cume_dist(x)
x |
A vector to rank By default, the smallest values will get the smallest ranks. Use Missing values will be given rank To rank by multiple columns at once, supply a data frame. |
A numeric vector containing a proportion.
Other ranking functions:
ntile()
,
row_number()
x <- c(5, 1, 3, 2, 2)
cume_dist(x)
percent_rank(x)
# You can understand what's going on by computing it by hand
sapply(x, function(xi) sum(x <= xi) / length(x))
sapply(x, function(xi) sum(x < xi) / (length(x) - 1))
# The real computations are a little more complex in order to
# correctly deal with missing values