partition {multidplyr}R Documentation

Partition data across workers in a cluster

Description

Partitioning ensures that all observations in a group end up on the same worker. To try and keep the observations on each worker balanced, 'partition()' uses a greedy algorithm that iteratively assigns each group to the worker that currently has the fewest rows.

Usage

partition(data, cluster)

Arguments

data

Dataset to partition, typically grouped. When grouped, all observations in a group will be assigned to the same cluster.

cluster

Cluster to use.

Value

A [party_df].

Examples

library(dplyr)
cl <- default_cluster()
cluster_library(cl, "dplyr")

mtcars2 <- partition(mtcars, cl)
mtcars2 %>% mutate(cyl2 = 2 * cyl)
mtcars2 %>% filter(vs == 1)
mtcars2 %>% group_by(cyl) %>% summarise(n())
mtcars2 %>% select(-cyl)

[Package multidplyr version 0.1.3 Index]