fexpand {timeplyr}R Documentation

Fast versions of tidyr::expand() and tidyr::complete().

Description

Fast versions of tidyr::expand() and tidyr::complete().

Usage

fexpand(
  data,
  ...,
  expand_type = c("crossing", "nesting"),
  sort = FALSE,
  .by = NULL,
  keep_class = TRUE,
  log_limit = 8
)

fcomplete(
  data,
  ...,
  expand_type = c("crossing", "nesting"),
  sort = FALSE,
  .by = NULL,
  keep_class = TRUE,
  fill = NA,
  log_limit = 8
)

Arguments

data

A data frame

...

Variables to expand

expand_type

Type of expansion to use where "nesting" finds combinations already present in the data (exactly the same as using distinct() but fexpand() allows new variables to be created on the fly and columns are sorted in the order given. "crossing" finds all combinations of values in the group variables.

sort

Logical. If TRUE expanded/completed variables are sorted. The default is FALSE.

.by

(Optional). A selection of columns to group by for this operation. Columns are specified using tidy-select.

keep_class

Logical. If TRUE then the class of the input data is retained. If FALSE, which is sometimes faster, a data.table is returned.

log_limit

The maximum log10 number of rows that can be expanded. Anything exceeding this will throw an error.

fill

A named list containing value-name pairs to fill the named implicit missing values.

Details

For un-grouped data fexpand() is similar in speed to tidyr::expand(). When the data contain many groups, fexpand() is much much faster (see examples).

The 2 main differences between fexpand() and tidyr::expand() are that:

For efficiency, when supplying groups, expansion is done on a by-group basis only if there are 2 or more variables that aren't part of the grouping. The reason is that a by-group calculation does not need to be done with 1 expansion variable as all combinations across groups already exist against that 1 variable. When expand_type = "nesting" groups are ignored for speed purposes as the result is the same.

An advantage of fexpand() is that it returns a data frame with the same class as the input. It also uses data.table for memory efficiency and collapse for speed.

A future development for fcomplete() would be to only fill values of variables that correspond only to both additional completed rows and rows that match the expanded rows, are filled in. For example, iris %>% mutate(test = NA_real_) %>% complete(Sepal.Length = 0:100, fill = list(test = 0)) fills in all NA values of test, whereas iris %>% mutate(test = NA_real_) %>% fcomplete(Sepal.Length = 0:100, fill = list(test = 0)) should only fill in values of test that correspond to Sepal.Length values of 0:100.

An additional note to add when expand_type = "nesting" is that if one of the supplied variables in ... does not exist in the data, but can be recycled to the length of the data, then it is added and treated as a data variable.

Value

A data.frame of expanded groups.

Examples

library(timeplyr)
library(dplyr)
library(lubridate)
library(nycflights13)

flights %>%
  fexpand(origin, dest)
flights %>%
  fexpand(origin, dest, sort = FALSE)

# Grouped expansions example
# 1 extra group (carrier) this is very quick
flights %>%
  group_by(origin, dest, tailnum) %>%
  fexpand(carrier)


[Package timeplyr version 0.5.0 Index]