R: Pack cell values from separate columns per data type into one...

pack {unpivotr}

R Documentation

Pack cell values from separate columns per data type into one list-column

Description

Pack cell values from separate columns per data type into one list-column

Usage

pack(
  cells,
  types = data_type,
  name = "value",
  drop_types = TRUE,
  drop_type_cols = TRUE
)

unpack(cells, values = value, name = "data_type", drop_packed = TRUE)

Arguments

`cells`	A data frame of cells, one row per cell. For `pack()` it must have a column that names, for each cell/row, which of the other columns the value is in. For `unpack()` it must have a list-column of cell values, where each element is named according to the data type of the value.
`types`	For `pack()`, the name of the column that that names, for each cell/row, which of the other columns the value is in.
`name`	A string. For `pack()`, the name to give the new list-column of values. For `unpack()`, the name to give the new column that will name, for each cell, which of the other columns the value is in.
`drop_types`	For `pack()`, whether to drop the column named by `types`.
`drop_type_cols`	For `pack()`, whether to drop the original columns of cell values.
`values`	For `unpack()`, the name of the list-column of cell values.
`drop_packed`	For `unpack()`, whether to drop the column named by `values`.

Details

When cells are represented by rows of a data frame, the values of the cells will be in different columns according to their data type. For example, the value of a cell containing text will be in a column called chr (or character if it came via tidyxl). A column called data_type names, for each cell, which column its value is in.

pack() rearranges the cell values in a different way, so that they are all in one column, by

taking each cell value, from whichever column.
making it an element of a list.
naming each element according to the column it came from.
making the list into a new list-column of the original data frame.

By default, the original columns are dropped, and so is the data_type column.

unpack() is the complement.

This can be useful for dropping all columns of cells except the ones that contain data. For example, tidyxl::xlsx_cells() returns a very wide data frame, and to make it narrow you might do:

select(cells, row, col, character, numeric, date)

But what if you don't know in advance that the data types you need are character, numeric and date? You might also need logical and error.

Instead, pack() all the data types into a single column, select it, and then unpack.

pack(cells) %>%
  select(row, col, value) %>%
  unpack()

Functions

unpack(): Unpack cell values from one list-column into separate columns per data type

Examples

# A normal data frame
w <- data.frame(foo = 1:2,
                bar = c("a", "b"),
                stringsAsFactors = FALSE)
w

# The same data, represented by one row per cell, with integer values in the
# `int` column and character values in the `chr` column.
x <- as_cells(w)
x

# pack() and unpack() are complements
pack(x)
unpack(pack(x))

# Drop non-data columns from a wide data frame of cells from tidyxl
if (require(tidyxl)) {
  cells <- tidyxl::xlsx_cells(system.file("extdata", "purpose.xlsx", package = "unpivotr"))
  cells

  pack(cells) %>%
    dplyr::select(row, col, value) %>%
    unpack()
}

[Package unpivotr version 0.6.3 Index]