Pipeline {rearrr}R Documentation

Chain multiple transformations

Description

[Experimental]

Build a pipeline of transformations to be applied sequentially.

Uses the same arguments for all groups in `data`.

Groupings are reset between each transformation. See group_cols.

Standard workflow: Instantiate pipeline -> Add transformations -> Apply to data

To apply different argument values to each group, see GeneratedPipeline for generating argument values for an arbitrary number of groups and FixedGroupsPipeline for specifying specific values for a fixed set of groups.

Public fields

transformations

list of transformations to apply.

names

Names of the transformations.

Methods

Public methods


Method add_transformation()

Add a transformation to the pipeline.

Usage
Pipeline$add_transformation(fn, args, name, group_cols = NULL)
Arguments
fn

Function that performs the transformation.

args

Named list with arguments for the `fn` function.

name

Name of the transformation step. Must be unique.

group_cols

Names of the columns to group the input data by before applying the transformation.

Note that the transformation function is applied separately to each group (subset). If the `fn` function requires access to the entire data.frame, the grouping columns should be specified as part of `args` and handled by the `fn` function.

Returns

The pipeline. To allow chaining of methods.


Method apply()

Apply the pipeline to a data.frame.

Usage
Pipeline$apply(data, verbose = FALSE)
Arguments
data

data.frame.

A grouped data.frame will raise a warning and the grouping will be ignored. Use the `group_cols` argument in the `add_transformation` method to specify how `data` should be grouped for each transformation.

verbose

Whether to print the progress.

Returns

Transformed version of `data`.


Method print()

Print an overview of the pipeline.

Usage
Pipeline$print(...)
Arguments
...

further arguments passed to or from other methods.

Returns

The pipeline. To allow chaining of methods.


Method clone()

The objects of this class are cloneable with this method.

Usage
Pipeline$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Author(s)

Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk

See Also

Other pipelines: FixedGroupsPipeline, GeneratedPipeline

Examples

# Attach package
library(rearrr)

# Create a data frame
df <- data.frame(
  "Index" = 1:12,
  "A" = c(1:4, 9:12, 15:18),
  "G" = rep(1:3, each = 4)
)

# Create new pipeline
pipe <- Pipeline$new()

# Add 2D rotation transformation
# Note that we specify the grouping via `group_cols`
pipe$add_transformation(
  fn = rotate_2d,
  args = list(
    x_col = "Index",
    y_col = "A",
    origin = c(0, 0),
    degrees = 45,
    suffix = "",
    overwrite = TRUE
  ),
  name = "Rotate",
  group_cols = "G"
)

# Add the `cluster_group` transformation
# Note that this function requires the entire input data
# to properly scale the groups. We therefore specify `group_cols`
# as part of `args`. This works as `cluster_groups()` accepts that
# argument.
pipe$add_transformation(
  fn = cluster_groups,
  args = list(
    cols = c("Index", "A"),
    suffix = "",
    overwrite = TRUE,
    multiplier = 0.05,
    group_cols = "G"
  ),
  name = "Cluster"
)

# Check pipeline object
pipe

# Apply pipeline to data.frame
# Enable `verbose` to print progress
pipe$apply(df, verbose = TRUE)

[Package rearrr version 0.3.4 Index]