sdf_explode {sparklyr.nested} | R Documentation |
Explode data along a column
Description
Exploding an array column of length N
will replicate the top level record N
times.
The i^th replicated record will contain a struct (not an array) corresponding to the i^th element
of the exploded array. Exploding will not promote any fields or otherwise change the schema of
the data.
Usage
sdf_explode(x, column, is_map = FALSE, keep_all = FALSE)
Arguments
x |
An object (usually a |
column |
The field to explode |
is_map |
Logical. The (scala) |
keep_all |
Logical. If |
Details
Two types of exploding are possible. The default method calls the scala explode
method.
This operation is supported in both Spark version > 1.6. It will however drop records where the
exploding field is empty/null. Alternatively keep_all=TRUE
will use the explode_outer
scala method introduced in spark 2 to not drop any records.
Examples
## Not run:
# first get some nested data
iris_tbl <- copy_to(sc, iris, name="iris")
iris_nst <- iris_tbl %>%
sdf_nest(Sepal_Length, Sepal_Width, Petal_Length, Petal_Width, .key="data") %>%
group_by(Species) %>%
summarize(data=collect_list(data))
# then explode it
iris_nst %>% sdf_explode(data)
## End(Not run)