stat_apply_group {ggpp} | R Documentation |
Apply a function to x or y values
Description
stat_summary_xy()
and stat_centroid()
are similar to
ggplot2::stat_summary()
but summarize both x
and y
values in the same plot layer. Differently to stat_summary()
no
grouping based on data values
is done; the grouping respected is that
already present based on mappings to aesthetics. This makes it possible to
highlight the actual location of the centroid with geom_point()
,
geom_text()
, and similar geometries. Instead, if we use
geom_rug()
they are only a convenience avoiding the need to add two
separate layers and flipping one of them using orientation = "y"
.
Usage
stat_apply_group(
mapping = NULL,
data = NULL,
geom = "line",
.fun.x = NULL,
.fun.x.args = list(),
.fun.y = NULL,
.fun.y.args = list(),
position = "identity",
na.rm = FALSE,
show.legend = FALSE,
inherit.aes = TRUE,
...
)
stat_summary_xy(
mapping = NULL,
data = NULL,
geom = "point",
.fun.x = NULL,
.fun.x.args = list(),
.fun.y = NULL,
.fun.y.args = list(),
position = "identity",
na.rm = FALSE,
show.legend = FALSE,
inherit.aes = TRUE,
...
)
stat_centroid(
mapping = NULL,
data = NULL,
geom = "point",
.fun = NULL,
.fun.args = list(),
position = "identity",
na.rm = FALSE,
show.legend = FALSE,
inherit.aes = TRUE,
...
)
Arguments
mapping |
The aesthetic mapping, usually constructed with
|
data |
A layer specific dataset - only needed if you want to override the plot defaults. |
geom |
The geometric object to use display the data |
.fun.x , .fun.y , .fun |
function to be applied or the name of the function to be applied as a character string. |
.fun.x.args , .fun.y.args , .fun.args |
additional arguments to be passed to the function as a named list. |
position |
The position adjustment to use for overlapping points on this layer |
na.rm |
a logical value indicating whether NA values should be stripped before the computation proceeds. |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
... |
other arguments passed on to |
Details
stat_apply_group
applies functions to data.
When possible it is preferable to use transformations through scales or
summary functions such as ggplot2::stat_summary()
,
stat_summary_xy()
or stat_centroid()
. There are some
computations that are not scale transformations but are not usual summaries
either, as the number of data values does not decrease all the way to one row
per group. A typical case for a summary is the computation of quantiles. For
transformations are cumulative ones, e.g., using cumsum()
,
runmed()
and similar functions. Obviously, it is always possible to
apply such functions to the data before plotting and passing them to a single
layer function. However, it can be useful to apply such functions on-the-fly
to ensure that grouping is consistent between computations and aesthetics.
One particularity of these statistics is that they can apply simultaneously
different functions to x
values and to y
values when needed. In
contrast to these statistics, geom_smooth
applies a
function that takes both x
and y
values as arguments.
These four statistics are similar. They differ on whether they return a single or multiple rows of data per group.
Value
A data frame with the same variables as the data input, with either a
single or multiple rows, with the values of x
and y
variables
replaced by the values returned by the applied functions, or possibly
filled with NA
if no function was supplied or available by default.
If the applied function returns a named vector, the names are copied into
columns x.names
and/or y.names
. If the summary function
applied returns a one row data frame, it will be column bound keeping
the column names, but overwritting columns x and/or y with y from the
summary data frame. In the names returned by .fun.x
the letter
"y" is replaced by "x". These allows the use of the same functions as in
ggplot2::stat_summary()
.
- x
x-value as returned by
.fun.x
, with names removed- y
y-value as returned by
.fun.y
, with names removed- x.names
if the x-value returned by
.fun.x
is named, these names- y.names
if the y-value returned by
.fun.y
is named, these names- xmin, xmax
values returned by
.fun.x
under these names, if present- ymin, ymax
values returned by
.fun.y
under these names, if present- <other>
additional values as returned by
.fun.y
under other names
Note
The applied function(s) must accept as first argument a vector that
matches the variables mapped to x
or y
aesthetics. For
stat_summary_xy()
and stat_centroid()
the function(s) to be
applied is(are) expected to return a vector of length 1 or a data frame
with only one row, as mean_se()
, mean_cl_normal()
mean_cl_boot()
, mean_sdl()
and median_hilow()
from
'ggplot2' do.
For stat_apply_group
the vectors returned by the
the functions applied to x
and y
must be of exactly the same
length. When only one of .fun.x
or .fun.y
are passed a
function as argument, the other variable in the returned data is filled
with NA_real_
. If other values are desired, they can be set by means
of a user-defined function.
References
Answers to question "R ggplot on-the-fly calculation by grouping variable" at https://stackoverflow.com/questions/51412522.
Examples
set.seed(123456)
my.df <- data.frame(X = rep(1:20,2),
Y = runif(40),
category = rep(c("A","B"), each = 20))
# make sure rows are ordered for X as we will use functions that rely on this
my.df <- my.df[order(my.df[["X"]]), ]
# Centroid
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_centroid(shape = "cross", size = 6) +
geom_point()
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_centroid(geom = "rug", linewidth = 1.5, .fun = median) +
geom_point()
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_centroid(geom = "text", aes(label = category)) +
geom_point()
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_summary_xy(geom = "pointrange",
.fun.x = mean, .fun.y = mean_se) +
geom_point()
# quantiles
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
geom_point() +
stat_apply_group(geom = "rug", .fun.y = quantile, .fun.x = quantile)
ggplot(my.df, aes(x = X, y = Y)) +
geom_point() +
stat_apply_group(geom = "rug", sides = "lr", color = "darkred",
.fun.y = quantile) +
stat_apply_group(geom = "text", hjust = "right", color = "darkred",
.fun.y = quantile,
.fun.x = function(x) {rep(22, 5)}, # set x to 22
mapping = aes(label = after_stat(y.names))) +
expand_limits(x = 21)
my.probs <- c(0.25, 0.5, 0.75)
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
geom_point() +
stat_apply_group(geom = "hline",
aes(yintercept = after_stat(y)),
.fun.y = quantile,
.fun.y.args = list(probs = my.probs))
# cummulative summaries
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_apply_group(.fun.x = function(x) {x},
.fun.y = cummax)
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_apply_group(.fun.x = cumsum, .fun.y = cumsum)
# diff returns a shorter vector by 1 for each group
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_apply_group(.fun.x = function(x) {x[-1L]},
.fun.y = diff, na.rm = TRUE)
# Running summaries
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
geom_point() +
stat_apply_group(.fun.x = function(x) {x},
.fun.y = runmed, .fun.y.args = list(k = 5))
# Rescaling per group
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_apply_group(.fun.x = function(x) {x},
.fun.y = function(x) {(x - min(x)) / (max(x) - min(x))})
# inspecting the returned data
if (requireNamespace("gginnards", quietly = TRUE)) {
library(gginnards)
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_centroid(.fun = mean_se, geom = "debug")
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_summary_xy(.fun.y = mean_se, geom = "debug")
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
stat_apply_group(.fun.y = cumsum, geom = "debug")
ggplot(my.df, aes(x = X, y = Y, colour = category)) +
geom_point() +
stat_apply_group(geom = "debug",
.fun.x = quantile,
.fun.x.args = list(probs = my.probs),
.fun.y = quantile,
.fun.y.args = list(probs = my.probs))
}