expand_distances {rearrr}R Documentation

Expand the distances to an origin

Description

[Experimental]

Moves the data points in n-dimensional space such that their distance to a specified origin is increased/decreased. A `multiplier` greater than 1 leads to expansion, while a positive `multiplier` lower than 1 leads to contraction.

The origin can be supplied as coordinates or as a function that returns coordinates. The latter can be useful when supplying a grouped data.frame and expanding around e.g. the centroid of each group.

The multiplier/exponent can be supplied as a constant or as a function that returns a constant. The latter can be useful when supplying a grouped data.frame and the multiplier/exponent depends on the data in the groups.

For expansion in each dimension separately, use expand_distances_each().

NOTE: When exponentiating, the default is to first add 1 to the distances, to ensure expansion even when the distance is between 0 and 1. If you need the purely exponentiated distances, disable `add_one_exp`.

Usage

expand_distances(
  data,
  cols = NULL,
  multiplier = NULL,
  multiplier_fn = NULL,
  origin = NULL,
  origin_fn = NULL,
  exponentiate = FALSE,
  add_one_exp = TRUE,
  suffix = "_expanded",
  keep_original = TRUE,
  mult_col_name = ifelse(isTRUE(exponentiate), ".exponent", ".multiplier"),
  origin_col_name = ".origin",
  overwrite = FALSE
)

Arguments

data

data.frame or vector.

cols

Names of columns in `data` to expand coordinates of. Each column is considered a dimension.

multiplier

Constant to multiply/exponentiate the distances to the origin by.

N.B. When `exponentiate` is TRUE, the `multiplier` becomes an exponent.

multiplier_fn

Function for finding the `multiplier`.

Input: Each column will be passed as a vector in the order of `cols`.

Output: A numeric scalar.

origin

Coordinates of the origin to expand around. A scalar to use in all dimensions or a vector with one scalar per dimension.

N.B. Ignored when `origin_fn` is not NULL.

origin_fn

Function for finding the origin coordinates.

Input: Each column will be passed as a vector in the order of `cols`.

Output: A vector with one scalar per dimension.

Can be created with create_origin_fn() if you want to apply the same function to each dimension.

E.g. `create_origin_fn(median)` would find the median of each column.

Built-in functions are centroid(), most_centered(), and midrange()

exponentiate

Whether to exponentiate instead of multiplying. (Logical)

add_one_exp

Whether to add 1 to the distances before exponentiating to ensure they don't contract when between 0 and 1. The added value is subtracted after the exponentiation. (Logical)

The distances to the origin (`d`) are exponentiated as such:

d <- d + 1

d <- d ^ multiplier

d <- d - 1

N.B. Ignored when `exponentiate` is FALSE.

suffix

Suffix to add to the names of the generated columns.

Use an empty string (i.e. "") to overwrite the original columns.

keep_original

Whether to keep the original columns. (Logical)

Some columns may have been overwritten, in which case only the newest versions are returned.

mult_col_name

Name of new column with the `multiplier`. If NULL, no column is added.

origin_col_name

Name of new column with the origin coordinates. If NULL, no column is added.

overwrite

Whether to allow overwriting of existing columns. (Logical)

Details

Increases the distance to the origin in n-dimensional space by multiplying or exponentiating it by the multiplier.

We first move the origin to the zero-coordinates (e.g. c(0, 0, 0)) and normalize each vector to unit length. We then multiply this unit vector by the multiplied/exponentiated distance and moves the origin back to its original coordinates.

The distance to the specified origin is calculated with:

d(P1, P2) = sqrt( (x2 - x1)^2 + (y2 - y1)^2 + (z2 - z1)^2 + ... )

Note: By default (when `add_one_exp` is TRUE), we add 1 to the distance before the exponentiation and subtract it afterwards. See `add_one_exp`.

Value

data.frame (tibble) with the expanded columns, along with the applied multiplier/exponent and origin coordinates.

Author(s)

Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk

See Also

Other mutate functions: apply_transformation_matrix(), cluster_groups(), dim_values(), expand_distances_each(), flip_values(), roll_values(), rotate_2d(), rotate_3d(), shear_2d(), shear_3d(), swirl_2d(), swirl_3d()

Other expander functions: expand_distances_each()

Other distance functions: closest_to(), dim_values(), distance(), expand_distances_each(), furthest_from(), swirl_2d(), swirl_3d()

Examples

# Attach packages
library(rearrr)
library(dplyr)
library(purrr)
has_ggplot <- require(ggplot2)  # Attach if installed

# Set seed
set.seed(1)

# Create a data frame
df <- data.frame(
  "x" = runif(20),
  "y" = runif(20),
  "g" = rep(1:4, each = 5)
)

# Expand distances in the two dimensions (x and y)
# With the origin at x=0.5, y=0.5
# We multiply the distances by 2
expand_distances(
  data = df,
  cols = c("x", "y"),
  multiplier = 2,
  origin = c(0.5, 0.5)
)

# Expand distances in the two dimensions (x and y)
# With the origin at x=0.5, y=0.5
# We exponentiate the distances by 2
expand_distances(
  data = df,
  cols = c("x", "y"),
  multiplier = 2,
  exponentiate = TRUE,
  origin = 0.5
)

# Expand values in one dimension (x)
# With the origin at x=0.5
# We exponentiate the distances by 3
expand_distances(
  data = df,
  cols = c("x"),
  multiplier = 3,
  exponentiate = TRUE,
  origin = 0.5
)

# Expand x and y around the centroid
# We use exponentiation for a more drastic effect
# The add_one_exp makes sure it expands
# even when x or y is in the range [0, <1]
# To compare multiple exponents, we wrap the
# call in purrr::map_dfr
df_expanded <- purrr::map_dfr(
  .x = c(1, 3, 5),
  .f = function(exponent) {
    expand_distances(
      data = df,
      cols = c("x", "y"),
      multiplier = exponent,
      origin_fn = centroid,
      exponentiate = TRUE,
      add_one_exp = TRUE
    )
  }
)
df_expanded

# Plot the expansions of x and y around the overall centroid
if (has_ggplot){
  ggplot(df_expanded, aes(x = x_expanded, y = y_expanded, color = factor(.exponent))) +
    geom_vline(
      xintercept = df_expanded[[".origin"]][[1]][[1]],
      size = 0.2, alpha = .4, linetype = "dashed"
    ) +
    geom_hline(
      yintercept = df_expanded[[".origin"]][[1]][[2]],
      size = 0.2, alpha = .4, linetype = "dashed"
    ) +
    geom_path(size = 0.2) +
    geom_point() +
    theme_minimal() +
    labs(x = "x", y = "y", color = "Exponent")
}

# Expand x and y around the centroid using multiplication
# To compare multiple multipliers, we wrap the
# call in purrr::map_dfr
df_expanded <- purrr::map_dfr(
  .x = c(1, 3, 5),
  .f = function(multiplier) {
    expand_distances(df,
      cols = c("x", "y"),
      multiplier = multiplier,
      origin_fn = centroid,
      exponentiate = FALSE
    )
  }
)
df_expanded

# Plot the expansions of x and y around the overall centroid
if (has_ggplot){
  ggplot(df_expanded, aes(x = x_expanded, y = y_expanded, color = factor(.multiplier))) +
    geom_vline(
      xintercept = df_expanded[[".origin"]][[1]][[1]],
      size = 0.2, alpha = .4, linetype = "dashed"
    ) +
    geom_hline(
      yintercept = df_expanded[[".origin"]][[1]][[2]],
      size = 0.2, alpha = .4, linetype = "dashed"
    ) +
    geom_path(size = 0.2, alpha = .8) +
    geom_point() +
    theme_minimal() +
    labs(x = "x", y = "y", color = "Multiplier")
}

#
# Contraction
#

# Group-wise contraction to create clusters
df_contracted <- df %>%
  dplyr::group_by(g) %>%
  expand_distances(
    cols = c("x", "y"),
    multiplier = 0.07,
    suffix = "_contracted",
    origin_fn = centroid
  )

# Plot the clustered data point on top of the original data points
if (has_ggplot){
  ggplot(df_contracted, aes(x = x_contracted, y = y_contracted, color = factor(g))) +
    geom_point(aes(x = x, y = y, color = factor(g)), alpha = 0.3, shape = 16) +
    geom_point() +
    theme_minimal() +
    labs(x = "x", y = "y", color = "g")
}

[Package rearrr version 0.3.4 Index]