pivot_wider.tbl_lazy {dbplyr}R Documentation

Pivot data from long to wide

Description

pivot_wider() "widens" data, increasing the number of columns and decreasing the number of rows. The inverse transformation is pivot_longer(). Learn more in vignette("pivot", "tidyr").

Usage

pivot_wider.tbl_lazy(
  data,
  id_cols = NULL,
  names_from = name,
  names_prefix = "",
  names_sep = "_",
  names_glue = NULL,
  names_sort = FALSE,
  names_repair = "check_unique",
  values_from = value,
  values_fill = NULL,
  values_fn = max,
  ...
)

Arguments

data

A lazy data frame backed by a database query.

id_cols

A set of columns that uniquely identifies each observation.

names_from, values_from

A pair of arguments describing which column (or columns) to get the name of the output column (names_from), and which column (or columns) to get the cell values from (values_from).

If values_from contains multiple values, the value will be added to the front of the output column.

names_prefix

String added to the start of every variable name.

names_sep

If names_from or values_from contains multiple variables, this will be used to join their values together into a single string to use as a column name.

names_glue

Instead of names_sep and names_prefix, you can supply a glue specification that uses the names_from columns (and special .value) to create custom column names.

names_sort

Should the column names be sorted? If FALSE, the default, column names are ordered by first appearance.

names_repair

What happens if the output has invalid column names?

values_fill

Optionally, a (scalar) value that specifies what each value should be filled in with when missing.

values_fn

A function, the default is max(), applied to the value in each cell in the output. In contrast to local data frames it must not be NULL.

...

Unused; included for compatibility with generic.

Details

The big difference to pivot_wider() for local data frames is that values_fn must not be NULL. By default it is max() which yields the same results as for local data frames if the combination of id_cols and value column uniquely identify an observation. Mind that you also do not get a warning if an observation is not uniquely identified.

The translation to SQL code basically works as follows:

  1. Get unique keys in names_from column.

  2. For each key value generate an expression of the form:

    value_fn(
      CASE WHEN (`names from column` == `key value`)
      THEN (`value column`)
      END
    ) AS `output column`
    
  3. Group data by id columns.

  4. Summarise the grouped data with the expressions from step 2.

Examples

if (require("tidyr", quietly = TRUE)) {
memdb_frame(
  id = 1,
  key = c("x", "y"),
  value = 1:2
) %>%
  tidyr::pivot_wider(
    id_cols = id,
    names_from = key,
    values_from = value
  )
}

[Package dbplyr version 2.1.1 Index]