edf {timeplyr} | R Documentation |
Grouped empirical cumulative distribution function applied to data
Description
Like dplyr::cume_dist(x)
and ecdf(x)(x)
but with added grouping and weighting functionality.
You can calculate the empirical distribution of x using
aggregated data by supplying frequency weights.
No expansion occurs which makes this function extremely efficient
for this type of data, of which plotting is a common application.
Usage
edf(x, g = NULL, wt = NULL)
Arguments
x |
Numeric vector. |
g |
Numeric vector of group IDs. |
wt |
Frequency weights. |
Value
A numeric vector the same length as x
.
Examples
library(timeplyr)
library(dplyr)
library(ggplot2)
set.seed(9123812)
x <- sample(seq(-10, 10, 0.5), size = 10^2, replace = TRUE)
plot(sort(edf(x)))
all.equal(edf(x), ecdf(x)(x))
all.equal(edf(x), cume_dist(x))
# Manual ECDF plot using only aggregate data
y <- rnorm(100, 10)
start <- floor(min(y) / 0.1) * 0.1
grid <- time_span(y, time_by = 0.1, from = start)
counts <- time_countv(y, time_by = 0.1, from = start, complete = TRUE)$n
edf <- edf(grid, wt = counts)
# Trivial here as this is the same
all.equal(unname(cumsum(counts)/sum(counts)), edf)
# Full ecdf
tibble(x) %>%
ggplot(aes(x = y)) +
stat_ecdf()
# Approximation using aggregate only data
tibble(grid, edf) %>%
ggplot(aes(x = grid, y = edf)) +
geom_step()
# Grouped example
g <- sample(letters[1:3], size = 10^2, replace = TRUE)
edf1 <- tibble(x, g) %>%
mutate(edf = cume_dist(x),
.by = g) %>%
pull(edf)
edf2 <- edf(x, g = g)
all.equal(edf1, edf2)
[Package timeplyr version 0.8.1 Index]