mutate_ {tidier} | R Documentation |
Drop-in replacement for mutate
Description
Provides supercharged version of mutate
with group_by
, order_by
and aggregation over arbitrary window frame
around a row for dataframes and lazy (remote) tbl
s of class tbl_lazy
.
Usage
mutate_(
x,
...,
.by,
.order_by,
.frame,
.index,
.desc = FALSE,
.complete = FALSE
)
Arguments
x |
( |
... |
expressions to be passed to |
.by |
(character vector, optional: Yes) Columns to group by |
.order_by |
(string, optional: Yes) Columns to order by |
.frame |
(vector, optional: Yes) Vector of length 2 indicating the
number of rows to consider before and after the current row. When argument
|
.index |
(string, optional: Yes, default: NULL) index column. This is supported when input is a dataframe only. |
.desc |
(flag, default: FALSE) Whether to order in descending order |
.complete |
(flag, default: FALSE) This will be passed to
|
Details
A window function returns a value for every input row of a dataframe
or lazy_tbl
based on a group of rows (frame) in the neighborhood of the
input row. This function implements computation over groups (partition_by
in SQL) in a predefined order (order_by
in SQL) across a neighborhood of
rows (frame) defined by a (up, down) where
up/down are number of rows before and after the corresponding row
up/down are interval objects (ex:
c(days(2), days(1))
). Interval objects are currently supported for dataframe only. (nottbl_lazy
)
This implementation is inspired by spark's window API.
Implementation Details:
For dataframe input:
Iteration per row over the window is implemented using the versatile
slider
.Application of a window aggregation can be optionally run in parallel over multiple groups (see argument
.by
) by setting a future parallel backend. This is implemented using furrr package.function subsumes regular usecases of
mutate
For tbl_lazy
input:
Uses
dbplyr::window_order
anddbplyr::window_frame
to translate topartition_by
and window frame specification.
Value
data.frame
or tbl_lazy
See Also
mutate
Examples
library("magrittr")
# example 1 (simple case with dataframe)
# Using iris dataset,
# compute cumulative mean of column `Sepal.Length`
# ordered by `Petal.Width` and `Sepal.Width` columns
# grouped by `Petal.Length` column
iris %>%
tidier::mutate_(sl_mean = mean(Sepal.Length),
.order_by = c("Petal.Width", "Sepal.Width"),
.by = "Petal.Length",
.frame = c(Inf, 0),
) %>%
dplyr::slice_min(n = 3, Petal.Width, by = Species)
# example 2 (detailed case with dataframe)
# Using a sample airquality dataset,
# compute mean temp over last seven days in the same month for every row
set.seed(101)
airquality %>%
# create date column
dplyr::mutate(date_col = lubridate::make_date(1973, Month, Day)) %>%
# create gaps by removing some days
dplyr::slice_sample(prop = 0.8) %>%
dplyr::arrange(date_col) %>%
# compute mean temperature over last seven days in the same month
tidier::mutate_(avg_temp_over_last_week = mean(Temp, na.rm = TRUE),
.order_by = "Day",
.by = "Month",
.frame = c(lubridate::days(7), # 7 days before current row
lubridate::days(-1) # do not include current row
),
.index = "date_col"
)
# example 3
airquality %>%
# create date column as character
dplyr::mutate(date_col =
as.character(lubridate::make_date(1973, Month, Day))
) %>%
tibble::as_tibble() %>%
# as `tbl_lazy`
dbplyr::memdb_frame() %>%
mutate_(avg_temp = mean(Temp),
.by = "Month",
.order_by = "date_col",
.frame = c(3, 3)
) %>%
dplyr::collect() %>%
dplyr::select(Ozone, Solar.R, Wind, Temp, Month, Day, date_col, avg_temp)