triplet_extremes {rearrr} | R Documentation |
Makes triplets of extreme values and sort by them
Description
The values are grouped in three such that the first group is formed by the lowest and highest values and the value closest to the median, the second group is formed by the second lowest and second highest values and the value second closest to the median, and so on. The values are then sorted by these groups and their actual value.
When the number of rows/elements in `data`
is not evenly divisible by three,
the `unequal_method_1`
(single excessive element) and
`unequal_method_2`
(two excessive elements)
determines which element(s) should form a smaller group.
This group will be the first group in a given grouping (see `num_groupings`
)
with the identifier 1
.
The *_vec()
version takes and returns a vector
.
Example:
The column values:
c(1, 2, 3, 4, 5, 6)
Are sorted in triplets as:
c(1, 3, 6, 2, 4, 5)
Usage
triplet_extremes(
data,
col = NULL,
middle_is = "middle",
unequal_method_1 = "middle",
unequal_method_2 = c("middle", "middle"),
num_groupings = 1,
balance = "mean",
order_by_aggregates = FALSE,
shuffle_members = FALSE,
shuffle_triplets = FALSE,
factor_name = ifelse(num_groupings == 1, ".triplet", ".tripleting"),
overwrite = FALSE
)
triplet_extremes_vec(
data,
middle_is = "middle",
unequal_method_1 = "middle",
unequal_method_2 = c("middle", "middle"),
num_groupings = 1,
balance = "mean",
order_by_aggregates = FALSE,
shuffle_members = FALSE,
shuffle_triplets = FALSE
)
Arguments
data |
|
col |
Column to create sorting factor by.
When |
middle_is |
Whether the middle element in the triplet is the nth closest element to the median value or the nth+1 lowest/highest value. One of: Triplet grouping is performed greedily from the most extreme values to the least extreme
values. E.g. Examples: When
When
When
|
unequal_method_1 , unequal_method_2 |
Method for dealing with either
a single excessive element (
Note: The excessive element(s) are extracted before triplet grouping. These elements
are put in their own group and given group identifier E.g. When |
num_groupings |
Number of times to group into triplets (recursively). At least Based on |
balance |
What to balance triplets for in a given secondary triplet grouping.
Either The first triplet grouping always groups the actual element values. meanTriplets have similar means. The values in the triplets from the previous grouping
are aggregated with spreadTriplets have similar spread (e.g. standard deviations). The values in the triplets
from the previous triplet grouping are aggregated with min / maxTriplets have similar minimum / maximum values. The values in the triplets from the
previous triplet grouping are aggregated with |
order_by_aggregates |
Whether to order the groups from initial groupings (first N.B. Only used when |
shuffle_members |
Whether to shuffle the order of the group members within the groups. (Logical) |
shuffle_triplets |
Whether to shuffle the order of the triplets. Triplet members remain together. (Logical) |
factor_name |
Name of new column with the sorting factor.
If |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
Value
The sorted data.frame
(tibble
) / vector
.
Optionally with the sorting factor added.
When `data`
is a vector
and `keep_factors`
is `FALSE`
,
the output will be a vector
. Otherwise, a data.frame
.
Author(s)
Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk
See Also
Other rearrange functions:
center_max()
,
center_min()
,
closest_to()
,
furthest_from()
,
pair_extremes()
,
position_max()
,
position_min()
,
rev_windows()
,
roll_elements()
,
shuffle_hierarchy()
Examples
# Attach packages
library(rearrr)
library(dplyr)
# Set seed
set.seed(1)
# Create a data frame
df <- data.frame(
"index" = 1:12,
"A" = sample(1:12),
"B" = runif(12),
"C" = LETTERS[1:12],
"G" = c(
1, 1, 1, 1, 2, 2,
2, 2, 3, 3, 3, 3
),
stringsAsFactors = FALSE
)
# Triplet group extreme indices (row numbers)
triplet_extremes(df)
# Triplet group extremes in each of the columns
triplet_extremes(df, col = "A")$A
triplet_extremes(df, col = "B")$B
triplet_extremes(df, col = "C")$C
# Shuffle the members triplet-wise
# The triplets maintain their order
# but the rows within each triplet are shuffled
triplet_extremes(df, col = "A", shuffle_members = TRUE)
# Shuffle the order of the triplets
# The triplets are shuffled but
# the rows within each triplet maintain their order
triplet_extremes(df, col = "A", shuffle_triplets = TRUE)
# Use recursive grouping
# Mostly meaningful with much larger datasets
# Order initial grouping by group identifiers
triplet_extremes(df, col = "A", num_groupings = 2)
# Order initial grouping by aggregate values
triplet_extremes(df, col = "A", num_groupings = 2, order_by_aggregates = TRUE)
# Grouped by G
# Each G group only has 4 elements
# so it only creates 1 triplet and a group
# with the single excessive element
# per G group
df %>%
dplyr::select(G, A) %>% # For clarity
dplyr::group_by(G) %>%
triplet_extremes(col = "A")
# Plot the extreme triplets
plot(
x = 1:12,
y = triplet_extremes(df, col = "A")$A,
col = as.character(rep(1:4, each = 3))
)
# With shuffled triplet members (run a few times)
plot(
x = 1:12,
y = triplet_extremes(df, col = "A", shuffle_members = TRUE)$A,
col = as.character(rep(1:4, each = 3))
)
# With shuffled triplets (run a few times)
plot(
x = rep(1:6, each = 2),
y = triplet_extremes(df, col = "A", shuffle_triplets = TRUE)$A,
col = as.character(rep(1:4, each = 3))
)