fexpand {timeplyr} | R Documentation |
Fast versions of tidyr::expand()
and tidyr::complete()
.
Description
Fast versions of tidyr::expand()
and tidyr::complete()
.
Usage
fexpand(
data,
...,
expand_type = c("crossing", "nesting"),
sort = FALSE,
.by = NULL
)
fcomplete(
data,
...,
expand_type = c("crossing", "nesting"),
sort = FALSE,
.by = NULL,
fill = NA
)
Arguments
data |
A data frame |
... |
Variables to expand |
expand_type |
Type of expansion to use where "nesting"
finds combinations already present in the data
(exactly the same as using |
sort |
Logical. If |
.by |
(Optional). A selection of columns to group by for this operation. Columns are specified using tidy-select. |
fill |
A named list containing value-name pairs to fill the named implicit missing values. |
Details
For un-grouped data fexpand()
is similar in speed to tidyr::expand()
.
When the data contain many groups, fexpand()
is much much faster (see examples).
The 2 main differences between fexpand()
and tidyr::expand()
are that:
tidyr style helpers like
nesting()
andcrossing()
are ignored. The type of expansion used is controlled throughexpand_type
and applies to all supplied variables.Expressions are first calculated on the entire ungrouped dataset before being expanded but within-group expansions will work on variables that already exist in the dataset. For example,
iris %>% group_by(Species) %>% fexpand(Sepal.Length, Sepal.Width)
will perform a grouped expansion butiris %>% group_by(Species) %>% fexpand(range(Sepal.Length))
will not.
For efficiency, when supplying groups, expansion is done on a by-group basis only if
there are 2 or more variables that aren't part of the grouping.
The reason is that a by-group calculation does not need to be done with 1 expansion variable
as all combinations across groups already exist against that 1 variable.
When expand_type = "nesting"
groups are ignored for speed purposes as the result is the same.
An advantage of fexpand()
is that it returns a data frame with the same class
as the input. It also uses data.table
for memory efficiency and collapse
for speed.
A future development for fcomplete()
would be to only fill values of variables that
correspond only to both additional completed rows and rows that match the expanded rows, are
filled in. For example,
iris %>% mutate(test = NA_real_) %>% complete(Sepal.Length = 0:100, fill = list(test = 0))
fills in all NA
values of test, whereas
iris %>% mutate(test = NA_real_) %>% fcomplete(Sepal.Length = 0:100, fill = list(test = 0))
should only fill in values of test that correspond to Sepal.Length values of 0:100
.
An additional note to add when expand_type = "nesting"
is that if one of the
supplied variables in ...
does not exist in the data, but can be recycled
to the length of the data, then it is added and treated as a data variable.
Value
A data.frame
of expanded groups.
Examples
library(timeplyr)
library(dplyr)
library(lubridate)
library(nycflights13)
flights %>%
fexpand(origin, dest)
flights %>%
fexpand(origin, dest, sort = FALSE)
# Grouped expansions example
# 1 extra group (carrier) this is very quick
flights %>%
group_by(origin, dest, tailnum) %>%
fexpand(carrier)