FixedGroupsPipeline {rearrr}R Documentation

Chain multiple transformations with different argument values per group

Description

[Experimental]

Build a pipeline of transformations to be applied sequentially.

Specify different argument values for each group in a fixed set of groups. E.g. if your data.frame contains 5 groups, you provide 5 argument values for each of the non-constant arguments (see `var_args`).

The number of expected groups is specified during initialization and the input `data` must be grouped such that it contains that exact number of groups.

Transformations are applied to groups separately, why the given transformation function only receives the subset of `data` belonging to the current group.

Standard workflow: Instantiate pipeline -> Add transformations -> Apply to data

To apply the same arguments to all groups, see Pipeline.

To apply generated argument values to an arbitrary number of groups, see GeneratedPipeline.

Super class

rearrr::Pipeline -> FixedGroupsPipeline

Public fields

transformations

list of transformations to apply.

names

Names of the transformations.

num_groups

Number of groups the pipeline will be applied to.

Methods

Public methods


Method new()

Initialize the pipeline with the number of groups the pipeline will be applied to.

Usage
FixedGroupsPipeline$new(num_groups)
Arguments
num_groups

Number of groups the pipeline will be applied to.


Method add_transformation()

Add a transformation to the pipeline.

Usage
FixedGroupsPipeline$add_transformation(fn, args, var_args, name)
Arguments
fn

Function that performs the transformation.

args

Named list with arguments for the `fn` function.

var_args

Named list of arguments with list of differing values for each group.

E.g. list("a" = list(1, 2, 3), "b" = list("a", "b", "c")) given 3 groups.

By adding ".apply" with a list of TRUE/FALSE flags, the transformation can be disabled for a specific group.

E.g. list(".apply" = list(TRUE, FALSE, TRUE), ....

name

Name of the transformation step. Must be unique.

Returns

The pipeline. To allow chaining of methods.


Method apply()

Apply the pipeline to a data.frame.

Usage
FixedGroupsPipeline$apply(data, verbose = FALSE)
Arguments
data

data.frame with the same number of groups as pre-registered in the pipeline.

You can find the number of groups in `data` with `dplyr::n_groups(data)`. The number of groups expected by the pipeline can be accessed with `pipe$num_groups`.

verbose

Whether to print the progress.

Returns

Transformed version of `data`.


Method print()

Print an overview of the pipeline.

Usage
FixedGroupsPipeline$print(...)
Arguments
...

further arguments passed to or from other methods.

Returns

The pipeline. To allow chaining of methods.


Method clone()

The objects of this class are cloneable with this method.

Usage
FixedGroupsPipeline$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Author(s)

Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk

See Also

Other pipelines: GeneratedPipeline, Pipeline

Examples

# Attach package
library(rearrr)
library(dplyr)

# Create a data frame
# We group it by G so we have 3 groups
df <- data.frame(
  "Index" = 1:12,
  "A" = c(1:4, 9:12, 15:18),
  "G" = rep(1:3, each = 4)
) %>%
  dplyr::group_by(G)

# Create new pipeline
pipe <- FixedGroupsPipeline$new(num_groups = 3)

# Add 2D rotation transformation
pipe$add_transformation(
  fn = rotate_2d,
  args = list(
    x_col = "Index",
    y_col = "A",
    suffix = "",
    overwrite = TRUE
  ),
  var_args = list(
    degrees = list(45, 90, 180),
    origin = list(c(0, 0), c(1, 2), c(-1, 0))
  ),
  name = "Rotate"
)

# Add the `cluster_group` transformation
# As the function is fed an ungrouped subset of `data`,
# i.e. the rows of that group, we need to specify `group_cols` in `args`
# That is specific to `cluster_groups()` though
# Also note `.apply` in `var_args` which tells the pipeline *not*
# to apply this transformation to the second group
pipe$add_transformation(
  fn = cluster_groups,
  args = list(
    cols = c("Index", "A"),
    suffix = "",
    overwrite = TRUE,
    group_cols = "G"
  ),
  var_args = list(
    multiplier = list(0.5, 1, 5),
    .apply = list(TRUE, FALSE, TRUE)
  ),
  name = "Cluster"
)

# Check pipeline object
pipe

# Apply pipeline to already grouped data.frame
# Enable `verbose` to print progress
pipe$apply(df, verbose = TRUE)


[Package rearrr version 0.3.4 Index]