sdf_nest {sparklyr.nested} | R Documentation |
Nest data in a Spark Dataframe
Description
This function is like tidyr::nest
. Calling this function will not
aggregate over other columns. Rather the output has the same number of
rows/records as the input. See examples of how to achieve row reduction
by aggregating elements using collect_list
, which is a Spark SQL function
Usage
sdf_nest(x, ..., .key = "data")
Arguments
x |
A Spark dataframe. |
... |
Columns to nest. |
.key |
Character. A name for the new column containing nested fields |
Examples
## Not run:
# produces a dataframe with an array of characteristics nested under
# each unique species identifier
iris_tbl <- copy_to(sc, iris, name="iris")
iris_tbl %>%
sdf_nest(Sepal_Length, Sepal_Width, Petal_Length, Petal_Width, .key="data") %>%
group_by(Species) %>%
summarize(data=collect_list(data))
## End(Not run)
[Package sparklyr.nested version 0.0.4 Index]