closest_to {rearrr}R Documentation

Orders values by shortest distance to an origin

Description

[Experimental]

Values are ordered by how close they are to the origin.

In 1d (when `cols` has length 1), the origin can be thought of as a target value. In n dimensions, the origin can be thought of as coordinates.

The origin can be supplied as coordinates or as a function that returns coordinates. The latter can be useful when supplying a grouped data.frame and ordering the rows by their distance to the centroid of each group.

The *_vec() version takes and returns a vector.

Example:

The column values:

c(1, 2, 3, 4, 5)

and origin = 2

are ordered as:

c(2, 1, 3, 4, 5)

Usage

closest_to(
  data,
  cols = NULL,
  origin = NULL,
  origin_fn = NULL,
  shuffle_ties = FALSE,
  origin_col_name = ".origin",
  distance_col_name = ".distance",
  overwrite = FALSE
)

closest_to_vec(data, origin = NULL, origin_fn = NULL, shuffle_ties = FALSE)

Arguments

data

data.frame or vector.

cols

Column(s) to create sorting factor by. When `NULL` and `data` is a data.frame, the row numbers are used.

origin

Coordinates of the origin to calculate distances to. A scalar to use in all dimensions or a vector with one scalar per dimension.

N.B. Ignored when `origin_fn` is not `NULL`.

origin_fn

Function for finding the origin coordinates.

Input: Each column will be passed as a vector in the order of `cols`.

Output: A vector with one scalar per dimension.

Can be created with create_origin_fn() if you want to apply the same function to each dimension.

E.g. `create_origin_fn(median)` would find the median of each column.

Built-in functions are centroid(), most_centered(), and midrange()

shuffle_ties

Whether to shuffle elements with the same distance to the origin. (Logical)

origin_col_name

Name of new column with the origin coordinates. If `NULL`, no column is added.

distance_col_name

Name of new column with the distances to the origin. If `NULL`, no column is added.

overwrite

Whether to allow overwriting of existing columns. (Logical)

Value

The sorted data.frame (tibble) / vector.

Author(s)

Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk

See Also

Other rearrange functions: center_max(), center_min(), furthest_from(), pair_extremes(), position_max(), position_min(), rev_windows(), roll_elements(), shuffle_hierarchy(), triplet_extremes()

Other distance functions: dim_values(), distance(), expand_distances(), expand_distances_each(), furthest_from(), swirl_2d(), swirl_3d()

Examples

# Attach packages
library(rearrr)
library(dplyr)

# Set seed
set.seed(1)

# Create a data frame
df <- data.frame(
  "index" = 1:10,
  "A" = sample(1:10),
  "B" = runif(10),
  "G" = c(
    1, 1, 1, 2, 2,
    2, 3, 3, 3, 3
  ),
  stringsAsFactors = FALSE
)

# Closest to 3 in a vector
closest_to_vec(1:10, origin = 3)

# Closest to the third row (index of data.frame)
closest_to(df, origin = 3)$index

# By each of the columns
closest_to(df, cols = "A", origin = 3)$A
closest_to(df, cols = "A", origin_fn = most_centered)$A
closest_to(df, cols = "B", origin = 0.5)$B
closest_to(df, cols = "B", origin_fn = centroid)$B

# Shuffle the elements with the same distance to the origin
closest_to(df,
  cols = "A",
  origin_fn = create_origin_fn(median),
  shuffle_ties = TRUE
)$A

# Grouped by G
df %>%
  dplyr::select(G, A) %>% # For clarity
  dplyr::group_by(G) %>%
  closest_to(
    cols = "A",
    origin_fn = create_origin_fn(median)
  )

# Plot the rearranged values
plot(
  x = 1:10,
  y = closest_to(df,
    cols = "B",
    origin_fn = create_origin_fn(median)
  )$B,
  xlab = "Position",
  ylab = "B"
)
plot(
  x = 1:10,
  y = closest_to(df,
    cols = "A",
    origin_fn = create_origin_fn(median),
    shuffle_ties = TRUE
  )$A,
  xlab = "Position",
  ylab = "A"
)

# In multiple dimensions
df %>%
  closest_to(cols = c("A", "B"), origin_fn = most_centered)

[Package rearrr version 0.3.4 Index]