R: Makes triplets of extreme values and sort by them

triplet_extremes {rearrr}

R Documentation

Makes triplets of extreme values and sort by them

Description

The values are grouped in three such that the first group is formed by the lowest and highest values and the value closest to the median, the second group is formed by the second lowest and second highest values and the value second closest to the median, and so on. The values are then sorted by these groups and their actual value.

When the number of rows/elements in `data` is not evenly divisible by three, the `unequal_method_1` (single excessive element) and `unequal_method_2` (two excessive elements) determines which element(s) should form a smaller group. This group will be the first group in a given grouping (see `num_groupings`) with the identifier 1.

The *_vec() version takes and returns a vector.

Example:

The column values:

c(1, 2, 3, 4, 5, 6)

Are sorted in triplets as:

c(1, 3, 6, 2, 4, 5)

Usage

triplet_extremes(
  data,
  col = NULL,
  middle_is = "middle",
  unequal_method_1 = "middle",
  unequal_method_2 = c("middle", "middle"),
  num_groupings = 1,
  balance = "mean",
  order_by_aggregates = FALSE,
  shuffle_members = FALSE,
  shuffle_triplets = FALSE,
  factor_name = ifelse(num_groupings == 1, ".triplet", ".tripleting"),
  overwrite = FALSE
)

triplet_extremes_vec(
  data,
  middle_is = "middle",
  unequal_method_1 = "middle",
  unequal_method_2 = c("middle", "middle"),
  num_groupings = 1,
  balance = "mean",
  order_by_aggregates = FALSE,
  shuffle_members = FALSE,
  shuffle_triplets = FALSE
)

Arguments

`data`	`data.frame` or `vector`.
`col`	Column to create sorting factor by. When `NULL` and `data` is a `data.frame`, the row numbers are used.
`middle_is`	Whether the middle element in the triplet is the nth closest element to the median value or the nth+1 lowest/highest value. One of: `middle` (default), `min`, or `max`. Triplet grouping is performed greedily from the most extreme values to the least extreme values. E.g. `c(1, 6, 12)` is created before `c(2, 5, 11)` which is made before `c(3, 7, 10)`. Examples: When `middle_is` == 'middle', a `1:12` sequence is grouped into: `c( c(1, 6, 12), c(2, 7, 11), c(3, 5, 10), c(4, 8, 9) )` When `middle_is` == 'min', a `1:12` sequence is grouped into: `c( c(1, 2, 12), c(3, 4, 11), c(5, 6, 10), c(7, 8, 9) )` When `middle_is` == 'max', a `1:12` sequence is grouped into: `c( c(1, 11, 12), c(2, 9, 10), c(3, 7, 8), c(4, 5, 6) )`
`unequal_method_1`, `unequal_method_2`	Method for dealing with either a single excessive element (`unequal_method_1`) or two excessive elements (`unequal_method_2`) when the number of rows/elements in `data` are not evenly divisible by three. `unequal_method_1`: One of: `min`, `middle` or `max`. `unequal_method_2`: Vector with two of: `min`, `middle` or `max`. Can be the same value twice. Note: The excessive element(s) are extracted before triplet grouping. These elements are put in their own group and given group identifier `1`. E.g. When `unequal_method_2` is `c("middle", "middle")` the two elements closest to the median are extracted.
`num_groupings`	Number of times to group into triplets (recursively). At least `1`. Based on `balance`, the secondary groupings perform extreme triplet grouping on either the sum, absolute difference, min, or max of the triplet elements.
`balance`	What to balance triplets for in a given secondary triplet grouping. Either `"mean"`, `"spread"`, `"min"`, or `"max"`. Can be a single string used for all secondary groupings or one for each secondary grouping (`num_groupings` - 1). The first triplet grouping always groups the actual element values. mean Triplets have similar means. The values in the triplets from the previous grouping are aggregated with `sum()` and extreme triplet grouped. spread Triplets have similar spread (e.g. standard deviations). The values in the triplets from the previous triplet grouping are aggregated with `sum(abs(diff()))` and extreme triplet grouped. min / max Triplets have similar minimum / maximum values. The values in the triplets from the previous triplet grouping are aggregated with `min()` / `max()` and extreme triplet grouped.
`order_by_aggregates`	Whether to order the groups from initial groupings (first `num_groupings` - 1) by their aggregate values instead of their group identifiers. N.B. Only used when `num_groupings` > 1.
`shuffle_members`	Whether to shuffle the order of the group members within the groups. (Logical)
`shuffle_triplets`	Whether to shuffle the order of the triplets. Triplet members remain together. (Logical)
`factor_name`	Name of new column with the sorting factor. If `NULL`, no column is added.
`overwrite`	Whether to allow overwriting of existing columns. (Logical)

Value

The sorted data.frame (tibble) / vector. Optionally with the sorting factor added.

When `data` is a vector and `keep_factors` is `FALSE`, the output will be a vector. Otherwise, a data.frame.

Author(s)

Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk

Examples

# Attach packages
library(rearrr)
library(dplyr)

# Set seed
set.seed(1)

# Create a data frame
df <- data.frame(
  "index" = 1:12,
  "A" = sample(1:12),
  "B" = runif(12),
  "C" = LETTERS[1:12],
  "G" = c(
    1, 1, 1, 1, 2, 2,
    2, 2, 3, 3, 3, 3
  ),
  stringsAsFactors = FALSE
)

# Triplet group extreme indices (row numbers)
triplet_extremes(df)

# Triplet group extremes in each of the columns
triplet_extremes(df, col = "A")$A
triplet_extremes(df, col = "B")$B
triplet_extremes(df, col = "C")$C

# Shuffle the members triplet-wise
# The triplets maintain their order
# but the rows within each triplet are shuffled
triplet_extremes(df, col = "A", shuffle_members = TRUE)

# Shuffle the order of the triplets
# The triplets are shuffled but
# the rows within each triplet maintain their order
triplet_extremes(df, col = "A", shuffle_triplets = TRUE)

# Use recursive grouping
# Mostly meaningful with much larger datasets
# Order initial grouping by group identifiers
triplet_extremes(df, col = "A", num_groupings = 2)
# Order initial grouping by aggregate values
triplet_extremes(df, col = "A", num_groupings = 2, order_by_aggregates = TRUE)

# Grouped by G
# Each G group only has 4 elements
# so it only creates 1 triplet and a group
# with the single excessive element
# per G group
df %>%
  dplyr::select(G, A) %>% # For clarity
  dplyr::group_by(G) %>%
  triplet_extremes(col = "A")

# Plot the extreme triplets
plot(
  x = 1:12,
  y = triplet_extremes(df, col = "A")$A,
  col = as.character(rep(1:4, each = 3))
)
# With shuffled triplet members (run a few times)
plot(
  x = 1:12,
  y = triplet_extremes(df, col = "A", shuffle_members = TRUE)$A,
  col = as.character(rep(1:4, each = 3))
)
# With shuffled triplets (run a few times)
plot(
  x = rep(1:6, each = 2),
  y = triplet_extremes(df, col = "A", shuffle_triplets = TRUE)$A,
  col = as.character(rep(1:4, each = 3))
)

[Package rearrr version 0.3.4 Index]

Makes triplets of extreme values and sort by them

Description

Usage

Arguments

mean

spread

min / max

Value

Author(s)

See Also

Examples