expand_distances_each {rearrr}R Documentation

Expand the distances to an origin in each dimension

Description

[Experimental]

Moves the data points in n-dimensional space such that their distance to the specified origin is increased/decreased in each dimension separately. A `multiplier` greater than 1 leads to expansion, while a positive `multiplier` lower than 1 leads to contraction.

The origin can be supplied as coordinates or as a function that returns coordinates. The latter can be useful when supplying a grouped data.frame and expanding around e.g. the centroid of each group.

The multipliers/exponents can be supplied as constant(s) or as a function that returns constants. The latter can be useful when supplying a grouped data.frame and the multiplier/exponent depends on the data in the groups. If supplying multiple constants, there must be one per dimension (length of `cols`).

For expansion of the multidimensional distance, use expand_distances().

NOTE: When exponentiating, the default is to first add 1 or -1 (depending on the sign of the distance) to the distances, to ensure expansion even when the distance is between -1 and 1. If you need the purely exponentiated distances, disable `add_one_exp`.

Usage

expand_distances_each(
  data,
  cols = NULL,
  multipliers = NULL,
  multipliers_fn = NULL,
  origin = NULL,
  origin_fn = NULL,
  exponentiate = FALSE,
  add_one_exp = TRUE,
  suffix = "_expanded",
  keep_original = TRUE,
  mult_col_name = ifelse(isTRUE(exponentiate), ".exponents", ".multipliers"),
  origin_col_name = ".origin",
  overwrite = FALSE
)

Arguments

data

data.frame or vector.

cols

Names of columns in `data` to expand. Each column is considered a dimension to expand in.

multipliers

Constant(s) to multiply/exponentiate the distance to the origin by. A scalar to use in all dimensions or a vector with one scalar per dimension.

N.B. When `exponentiate` is TRUE, the `multipliers` become exponents.

multipliers_fn

Function for finding the `multipliers`.

Input: Each column will be passed as a vector in the order of `cols`.

Output: A numeric vector with one element per dimension.

Just as for `origin_fn`, it can be created with create_origin_fn() if you want to apply the same function to each dimension. See `origin_fn`.

origin

Coordinates of the origin to expand around. A scalar to use in all dimensions or a vector with one scalar per dimension.

N.B. Ignored when `origin_fn` is not NULL.

origin_fn

Function for finding the origin coordinates.

Input: Each column will be passed as a vector in the order of `cols`.

Output: A vector with one scalar per dimension.

Can be created with create_origin_fn() if you want to apply the same function to each dimension.

E.g. `create_origin_fn(median)` would find the median of each column.

Built-in functions are centroid(), most_centered(), and midrange()

exponentiate

Whether to exponentiate instead of multiplying. (Logical)

add_one_exp

Whether to add the sign (either 1 or -1) before exponentiating to ensure the values don't contract. The added value is subtracted after the exponentiation. (Logical)

Exponentiation becomes:

x <- x + sign(x)

x <- sign(x) * abs(x) ^ multiplier

x <- x - sign(x)

N.B. Ignored when `exponentiate` is FALSE.

suffix

Suffix to add to the names of the generated columns.

Use an empty string (i.e. "") to overwrite the original columns.

keep_original

Whether to keep the original columns. (Logical)

Some columns may have been overwritten, in which case only the newest versions are returned.

mult_col_name

Name of new column with the multiplier(s). If NULL, no column is added.

origin_col_name

Name of new column with the origin coordinates. If NULL, no column is added.

overwrite

Whether to allow overwriting of existing columns. (Logical)

Details

For each value of each dimension (column), either multiply or exponentiate by the multiplier:

# Multiplication

x <- x * multiplier

# Exponentiation

x <- sign(x) * abs(x) ^ multiplier

Note: By default (when `add_one_exp` is TRUE), we add the sign (1 / -1) of the value before the exponentiation and subtract it afterwards. See `add_one_exp`.

Value

data.frame (tibble) with the expanded columns, along with the applied multiplier/exponent and origin coordinates.

Author(s)

Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk

See Also

Other mutate functions: apply_transformation_matrix(), cluster_groups(), dim_values(), expand_distances(), flip_values(), roll_values(), rotate_2d(), rotate_3d(), shear_2d(), shear_3d(), swirl_2d(), swirl_3d()

Other expander functions: expand_distances()

Other distance functions: closest_to(), dim_values(), distance(), expand_distances(), furthest_from(), swirl_2d(), swirl_3d()

Examples

# Attach packages
library(rearrr)
library(dplyr)
library(purrr)
has_ggplot <- require(ggplot2)  # Attach if installed

# Set seed
set.seed(1)

# Create a data frame
df <- data.frame(
  "x" = runif(20),
  "y" = runif(20),
  "g" = rep(1:4, each = 5)
)

# Expand values in the two dimensions (x and y)
# With the origin at x=0.5, y=0.5
# We expand x by 2 and y by 4
expand_distances_each(
  data = df,
  cols = c("x", "y"),
  multipliers = c(2, 4),
  origin = c(0.5, 0.5)
)

# Expand values in the two dimensions (x and y)
# With the origin at x=0.5, y=0.5
# We expand both by 3
expand_distances_each(
  data = df,
  cols = c("x", "y"),
  multipliers = 3,
  origin = 0.5
)

# Expand values in one dimension (x)
# With the origin at x=0.5
# We expand by 3
expand_distances_each(
  data = df,
  cols = c("x"),
  multipliers = 3,
  origin = 0.5
)

# Expand x and y around the centroid
# We use exponentiation for a more drastic effect
# The add_one_exp makes sure it expands
# even when x or y is in the range [>-1, <1]
# To compare multiple exponents, we wrap the
# call in purrr::map_dfr
df_expanded <- purrr::map_dfr(
  .x = c(1, 2.0, 3.0, 4.0),
  .f = function(exponent) {
    expand_distances_each(
      data = df,
      cols = c("x", "y"),
      multipliers = exponent,
      origin_fn = centroid,
      exponentiate = TRUE,
      add_one_exp = TRUE
    )
  }
)
df_expanded

# Plot the expansions of x and y around the overall centroid
if (has_ggplot){
  ggplot(df_expanded, aes(x = x_expanded, y = y_expanded, color = factor(.exponents))) +
    geom_vline(
      xintercept = df_expanded[[".origin"]][[1]][[1]],
      size = 0.2, alpha = .4, linetype = "dashed"
    ) +
    geom_hline(
      yintercept = df_expanded[[".origin"]][[1]][[2]],
      size = 0.2, alpha = .4, linetype = "dashed"
    ) +
    geom_point() +
    theme_minimal() +
    labs(x = "x", y = "y", color = "Exponent")
}

# Expand x and y around the centroid using multiplication
# To compare multiple multipliers, we wrap the
# call in purrr::map_dfr
df_expanded <- purrr::map_dfr(
  .x = c(1, 2.0, 3.0, 4.0),
  .f = function(multiplier) {
    expand_distances_each(df,
      cols = c("x", "y"),
      multipliers = multiplier,
      origin_fn = centroid,
      exponentiate = FALSE
    )
  }
)
df_expanded

# Plot the expansions of x and y around the overall centroid
if (has_ggplot){
ggplot(df_expanded, aes(x = x_expanded, y = y_expanded, color = factor(.multipliers))) +
    geom_vline(
      xintercept = df_expanded[[".origin"]][[1]][[1]],
      size = 0.2, alpha = .4, linetype = "dashed"
    ) +
    geom_hline(
      yintercept = df_expanded[[".origin"]][[1]][[2]],
      size = 0.2, alpha = .4, linetype = "dashed"
    ) +
    geom_point() +
    theme_minimal() +
    labs(x = "x", y = "y", color = "Multiplier")
}

# Expand x and y with different multipliers
# around the centroid using multiplication
df_expanded <- expand_distances_each(
  df,
  cols = c("x", "y"),
  multipliers = c(1.25, 10),
  origin_fn = centroid,
  exponentiate = FALSE
)
df_expanded

# Plot the expansions of x and y around the overall centroid
# Note how the y axis is expanded a lot more than the x-axis
if (has_ggplot){
  ggplot(df_expanded, aes(x = x_expanded, y = y_expanded)) +
    geom_vline(
      xintercept = df_expanded[[".origin"]][[1]][[1]],
      size = 0.2, alpha = .4, linetype = "dashed"
    ) +
    geom_hline(
      yintercept = df_expanded[[".origin"]][[1]][[2]],
      size = 0.2, alpha = .4, linetype = "dashed"
    ) +
    geom_line(aes(color = "Expanded")) +
    geom_point(aes(color = "Expanded")) +
    geom_line(aes(x = x, y = y, color = "Original")) +
    geom_point(aes(x = x, y = y, color = "Original")) +
    theme_minimal() +
    labs(x = "x", y = "y", color = "Multiplier")
}

#
# Contraction
#

# Group-wise contraction to create clusters
df_contracted <- df %>%
  dplyr::group_by(g) %>%
  expand_distances_each(
    cols = c("x", "y"),
    multipliers = 0.07,
    suffix = "_contracted",
    origin_fn = centroid
  )

# Plot the clustered data point on top of the original data points
if (has_ggplot){
  ggplot(df_contracted, aes(x = x_contracted, y = y_contracted, color = factor(g))) +
    geom_point(aes(x = x, y = y, color = factor(g)), alpha = 0.3, shape = 16) +
    geom_point() +
    theme_minimal() +
    labs(x = "x", y = "y", color = "g")
}

[Package rearrr version 0.3.4 Index]