R: Nest data in a Spark Dataframe

sdf_nest {sparklyr.nested}

R Documentation

Nest data in a Spark Dataframe

Description

This function is like tidyr::nest. Calling this function will not aggregate over other columns. Rather the output has the same number of rows/records as the input. See examples of how to achieve row reduction by aggregating elements using collect_list, which is a Spark SQL function

Usage

sdf_nest(x, ..., .key = "data")

Arguments

`x`	A Spark dataframe.
`...`	Columns to nest.
`.key`	Character. A name for the new column containing nested fields

Examples

## Not run: 
# produces a dataframe with an array of characteristics nested under
# each unique species identifier
iris_tbl <- copy_to(sc, iris, name="iris")
iris_tbl %>%
  sdf_nest(Sepal_Length, Sepal_Width, Petal_Length, Petal_Width, .key="data") %>%
  group_by(Species) %>%
  summarize(data=collect_list(data))

## End(Not run)

[Package sparklyr.nested version 0.0.4 Index]