R: Fast versions of 'tidyr::expand()' and 'tidyr::complete()'.

fexpand {timeplyr}

R Documentation

Fast versions of `tidyr::expand()` and `tidyr::complete()`.

Description

Fast versions of tidyr::expand() and tidyr::complete().

Usage

fexpand(
  data,
  ...,
  expand_type = c("crossing", "nesting"),
  sort = FALSE,
  .by = NULL
)

fcomplete(
  data,
  ...,
  expand_type = c("crossing", "nesting"),
  sort = FALSE,
  .by = NULL,
  fill = NA
)

Arguments

`data`	A data frame
`...`	Variables to expand
`expand_type`	Type of expansion to use where "nesting" finds combinations already present in the data (exactly the same as using `distinct()` but `fexpand()` allows new variables to be created on the fly and columns are sorted in the order given. "crossing" finds all combinations of values in the group variables.
`sort`	Logical. If `TRUE` expanded/completed variables are sorted. The default is `FALSE`.
`.by`	(Optional). A selection of columns to group by for this operation. Columns are specified using tidy-select.
`fill`	A named list containing value-name pairs to fill the named implicit missing values.

Details

For un-grouped data fexpand() is similar in speed to tidyr::expand(). When the data contain many groups, fexpand() is much much faster (see examples).

The 2 main differences between fexpand() and tidyr::expand() are that:

tidyr style helpers like nesting() and crossing() are ignored. The type of expansion used is controlled through expand_type and applies to all supplied variables.
Expressions are first calculated on the entire ungrouped dataset before being expanded but within-group expansions will work on variables that already exist in the dataset. For example, iris %>% group_by(Species) %>% fexpand(Sepal.Length, Sepal.Width) will perform a grouped expansion but iris %>% group_by(Species) %>% fexpand(range(Sepal.Length)) will not.

For efficiency, when supplying groups, expansion is done on a by-group basis only if there are 2 or more variables that aren't part of the grouping. The reason is that a by-group calculation does not need to be done with 1 expansion variable as all combinations across groups already exist against that 1 variable. When expand_type = "nesting" groups are ignored for speed purposes as the result is the same.

An advantage of fexpand() is that it returns a data frame with the same class as the input. It also uses data.table for memory efficiency and collapse for speed.

A future development for fcomplete() would be to only fill values of variables that correspond only to both additional completed rows and rows that match the expanded rows, are filled in. For example, iris %>% mutate(test = NA_real_) %>% complete(Sepal.Length = 0:100, fill = list(test = 0)) fills in all NA values of test, whereas iris %>% mutate(test = NA_real_) %>% fcomplete(Sepal.Length = 0:100, fill = list(test = 0)) should only fill in values of test that correspond to Sepal.Length values of 0:100.

An additional note to add when expand_type = "nesting" is that if one of the supplied variables in ... does not exist in the data, but can be recycled to the length of the data, then it is added and treated as a data variable.

Value

A data.frame of expanded groups.

Examples

library(timeplyr)
library(dplyr)
library(lubridate)
library(nycflights13)

flights %>%
  fexpand(origin, dest)
flights %>%
  fexpand(origin, dest, sort = FALSE)

# Grouped expansions example
# 1 extra group (carrier) this is very quick
flights %>%
  group_by(origin, dest, tailnum) %>%
  fexpand(carrier)

[Package timeplyr version 0.8.1 Index]