pair_extremes {rearrr} | R Documentation |
Pair extreme values and sort by the pairs
Description
The values are paired/grouped such that the lowest and highest values form the first group, the second lowest and the second highest values form the second group, and so on. The values are then sorted by these groups/pairs.
When `data`
has an uneven number of rows, the `unequal_method`
determines which group should have only 1
element.
The *_vec()
version takes and returns a vector
.
Example:
The column values:
c(1, 2, 3, 4, 5, 6)
Creates the sorting factor:
c(1, 2, 3, 3, 2, 1)
And are ordered as:
c(1, 6, 2, 5, 3, 4)
Usage
pair_extremes(
data,
col = NULL,
unequal_method = "middle",
num_pairings = 1,
balance = "mean",
order_by_aggregates = FALSE,
shuffle_members = FALSE,
shuffle_pairs = FALSE,
factor_name = ifelse(num_pairings == 1, ".pair", ".pairing"),
overwrite = FALSE
)
pair_extremes_vec(
data,
unequal_method = "middle",
num_pairings = 1,
balance = "mean",
order_by_aggregates = FALSE,
shuffle_members = FALSE,
shuffle_pairs = FALSE
)
Arguments
data |
|
col |
Column to create sorting factor by.
When |
unequal_method |
Method for dealing with an
unequal number of rows/elements in One of: firstThe first group will have size Example: The ordered column values:
Creates the sorting factor:
And are ordered as:
middleThe middle group will have size Example: The ordered column values:
Creates the sorting factor:
And are ordered as:
lastThe last group will have size Example: The ordered column values:
Creates the sorting factor:
And are ordered as:
|
num_pairings |
Number of pairings to perform (recursively). At least Based on |
balance |
What to balance pairs for in a given secondary pairing.
Either The first pairing always pairs the actual element values. meanPairs have similar means. The values in the pairs from the previous pairing
are aggregated with spreadPairs have similar spread (e.g. standard deviations).
The values in the pairs from the previous pairing
are aggregated with min / maxPairs have similar minimum / maximum values. The values in the pairs from the previous pairing
are aggregated with |
order_by_aggregates |
Whether to order the pairs from initial pairings (first N.B. Only used when |
shuffle_members |
Whether to shuffle the order of the group members within the groups. (Logical) |
shuffle_pairs |
Whether to shuffle the order of the pairs. Pair members remain together. (Logical) |
factor_name |
Name of new column with the sorting factor.
If |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
Value
The sorted data.frame
(tibble
) / vector
.
Optionally with the sorting factor added.
When `data`
is a vector
and `keep_factors`
is FALSE
,
the output will be a vector
. Otherwise, a data.frame
.
Author(s)
Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk
See Also
Other rearrange functions:
center_max()
,
center_min()
,
closest_to()
,
furthest_from()
,
position_max()
,
position_min()
,
rev_windows()
,
roll_elements()
,
shuffle_hierarchy()
,
triplet_extremes()
Examples
# Attach packages
library(rearrr)
library(dplyr)
# Set seed
set.seed(1)
# Create a data frame
df <- data.frame(
"index" = 1:10,
"A" = sample(1:10),
"B" = runif(10),
"C" = LETTERS[1:10],
"G" = c(
1, 1, 1, 2, 2,
2, 3, 3, 3, 3
),
stringsAsFactors = FALSE
)
# Pair extreme indices (row numbers)
pair_extremes(df)
# Pair extremes in each of the columns
pair_extremes(df, col = "A")$A
pair_extremes(df, col = "B")$B
pair_extremes(df, col = "C")$C
# Shuffle the members pair-wise
# The rows within each pair are shuffled
# while the `.pair` column maintains it order
pair_extremes(df, col = "A", shuffle_members = TRUE)
# Shuffle the order of the pairs
# The rows within each pair maintain their order
# and stay together but the `.pair` column is shuffled
pair_extremes(df, col = "A", shuffle_pairs = TRUE)
# Use recursive pairing
# Mostly meaningful with much larger datasets
# Order initial grouping by pair identifiers
pair_extremes(df, col = "A", num_pairings = 2)
# Order initial grouping by aggregate values
pair_extremes(df, col = "A", num_pairings = 2, order_by_aggregates = TRUE)
# Grouped by G
# Each G group only has 3 elements
# so it only creates 1 pair and a group
# with the single excessive element
# per G group
df %>%
dplyr::select(G, A) %>% # For clarity
dplyr::group_by(G) %>%
pair_extremes(col = "A")
# Plot the extreme pairs
plot(
x = 1:10,
y = pair_extremes(df, col = "B")$B,
col = as.character(rep(1:5, each = 2))
)
# With shuffled pair members (run a few times)
plot(
x = 1:10,
y = pair_extremes(df, col = "B", shuffle_members = TRUE)$B,
col = as.character(rep(1:5, each = 2))
)
# With shuffled pairs (run a few times)
plot(
x = rep(1:5, each = 2),
y = pair_extremes(df, col = "B", shuffle_pairs = TRUE)$B,
col = as.character(rep(1:5, each = 2))
)