triplet_extremes {rearrr}R Documentation

Makes triplets of extreme values and sort by them

Description

[Experimental]

The values are grouped in three such that the first group is formed by the lowest and highest values and the value closest to the median, the second group is formed by the second lowest and second highest values and the value second closest to the median, and so on. The values are then sorted by these groups and their actual value.

When the number of rows/elements in `data` is not evenly divisible by three, the `unequal_method_1` (single excessive element) and `unequal_method_2` (two excessive elements) determines which element(s) should form a smaller group. This group will be the first group in a given grouping (see `num_groupings`) with the identifier 1.

The *_vec() version takes and returns a vector.

Example:

The column values:

c(1, 2, 3, 4, 5, 6)

Are sorted in triplets as:

c(1, 3, 6, 2, 4, 5)

Usage

triplet_extremes(
  data,
  col = NULL,
  middle_is = "middle",
  unequal_method_1 = "middle",
  unequal_method_2 = c("middle", "middle"),
  num_groupings = 1,
  balance = "mean",
  order_by_aggregates = FALSE,
  shuffle_members = FALSE,
  shuffle_triplets = FALSE,
  factor_name = ifelse(num_groupings == 1, ".triplet", ".tripleting"),
  overwrite = FALSE
)

triplet_extremes_vec(
  data,
  middle_is = "middle",
  unequal_method_1 = "middle",
  unequal_method_2 = c("middle", "middle"),
  num_groupings = 1,
  balance = "mean",
  order_by_aggregates = FALSE,
  shuffle_members = FALSE,
  shuffle_triplets = FALSE
)

Arguments

data

data.frame or vector.

col

Column to create sorting factor by. When `NULL` and `data` is a data.frame, the row numbers are used.

middle_is

Whether the middle element in the triplet is the nth closest element to the median value or the nth+1 lowest/highest value.

One of: middle (default), min, or max.

Triplet grouping is performed greedily from the most extreme values to the least extreme values. E.g. c(1, 6, 12) is created before c(2, 5, 11) which is made before c(3, 7, 10).

Examples:

When `middle_is` == 'middle', a 1:12 sequence is grouped into:

c( c(1, 6, 12), c(2, 7, 11), c(3, 5, 10), c(4, 8, 9) )

When `middle_is` == 'min', a 1:12 sequence is grouped into:

c( c(1, 2, 12), c(3, 4, 11), c(5, 6, 10), c(7, 8, 9) )

When `middle_is` == 'max', a 1:12 sequence is grouped into:

c( c(1, 11, 12), c(2, 9, 10), c(3, 7, 8), c(4, 5, 6) )

unequal_method_1, unequal_method_2

Method for dealing with either a single excessive element (`unequal_method_1`) or two excessive elements (`unequal_method_2`) when the number of rows/elements in `data` are not evenly divisible by three.

`unequal_method_1`: One of: min, middle or max.

`unequal_method_2`: Vector with two of: min, middle or max. Can be the same value twice.

Note: The excessive element(s) are extracted before triplet grouping. These elements are put in their own group and given group identifier 1.

E.g. When `unequal_method_2` is c("middle", "middle") the two elements closest to the median are extracted.

num_groupings

Number of times to group into triplets (recursively). At least 1.

Based on `balance`, the secondary groupings perform extreme triplet grouping on either the sum, absolute difference, min, or max of the triplet elements.

balance

What to balance triplets for in a given secondary triplet grouping. Either "mean", "spread", "min", or "max". Can be a single string used for all secondary groupings or one for each secondary grouping (`num_groupings` - 1).

The first triplet grouping always groups the actual element values.

mean

Triplets have similar means. The values in the triplets from the previous grouping are aggregated with `sum()` and extreme triplet grouped.

spread

Triplets have similar spread (e.g. standard deviations). The values in the triplets from the previous triplet grouping are aggregated with `sum(abs(diff()))` and extreme triplet grouped.

min / max

Triplets have similar minimum / maximum values. The values in the triplets from the previous triplet grouping are aggregated with `min()` / `max()` and extreme triplet grouped.

order_by_aggregates

Whether to order the groups from initial groupings (first `num_groupings` - 1) by their aggregate values instead of their group identifiers.

N.B. Only used when `num_groupings` > 1.

shuffle_members

Whether to shuffle the order of the group members within the groups. (Logical)

shuffle_triplets

Whether to shuffle the order of the triplets. Triplet members remain together. (Logical)

factor_name

Name of new column with the sorting factor. If `NULL`, no column is added.

overwrite

Whether to allow overwriting of existing columns. (Logical)

Value

The sorted data.frame (tibble) / vector. Optionally with the sorting factor added.

When `data` is a vector and `keep_factors` is `FALSE`, the output will be a vector. Otherwise, a data.frame.

Author(s)

Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk

See Also

Other rearrange functions: center_max(), center_min(), closest_to(), furthest_from(), pair_extremes(), position_max(), position_min(), rev_windows(), roll_elements(), shuffle_hierarchy()

Examples

# Attach packages
library(rearrr)
library(dplyr)

# Set seed
set.seed(1)

# Create a data frame
df <- data.frame(
  "index" = 1:12,
  "A" = sample(1:12),
  "B" = runif(12),
  "C" = LETTERS[1:12],
  "G" = c(
    1, 1, 1, 1, 2, 2,
    2, 2, 3, 3, 3, 3
  ),
  stringsAsFactors = FALSE
)

# Triplet group extreme indices (row numbers)
triplet_extremes(df)

# Triplet group extremes in each of the columns
triplet_extremes(df, col = "A")$A
triplet_extremes(df, col = "B")$B
triplet_extremes(df, col = "C")$C

# Shuffle the members triplet-wise
# The triplets maintain their order
# but the rows within each triplet are shuffled
triplet_extremes(df, col = "A", shuffle_members = TRUE)

# Shuffle the order of the triplets
# The triplets are shuffled but
# the rows within each triplet maintain their order
triplet_extremes(df, col = "A", shuffle_triplets = TRUE)

# Use recursive grouping
# Mostly meaningful with much larger datasets
# Order initial grouping by group identifiers
triplet_extremes(df, col = "A", num_groupings = 2)
# Order initial grouping by aggregate values
triplet_extremes(df, col = "A", num_groupings = 2, order_by_aggregates = TRUE)

# Grouped by G
# Each G group only has 4 elements
# so it only creates 1 triplet and a group
# with the single excessive element
# per G group
df %>%
  dplyr::select(G, A) %>% # For clarity
  dplyr::group_by(G) %>%
  triplet_extremes(col = "A")

# Plot the extreme triplets
plot(
  x = 1:12,
  y = triplet_extremes(df, col = "A")$A,
  col = as.character(rep(1:4, each = 3))
)
# With shuffled triplet members (run a few times)
plot(
  x = 1:12,
  y = triplet_extremes(df, col = "A", shuffle_members = TRUE)$A,
  col = as.character(rep(1:4, each = 3))
)
# With shuffled triplets (run a few times)
plot(
  x = rep(1:6, each = 2),
  y = triplet_extremes(df, col = "A", shuffle_triplets = TRUE)$A,
  col = as.character(rep(1:4, each = 3))
)

[Package rearrr version 0.3.4 Index]