stat_dens2d_labels {ggpp} | R Documentation |
Replace labels in data based on 2D density
Description
stat_dens2d_labels()
Sets values mapped to the
label
aesthetic to ""
or a user provided character string
based on the local density in regions of a plot panel. Its main use is
together with repulsive geoms from package ggrepel
.
If there is no mapping to label
in data
, the mapping is set
to rownames(data)
, with a message.
Usage
stat_dens2d_labels(
mapping = NULL,
data = NULL,
geom = "text",
position = "identity",
...,
keep.fraction = 0.1,
keep.number = Inf,
keep.sparse = TRUE,
keep.these = FALSE,
exclude.these = FALSE,
these.target = "label",
pool.along = c("xy", "x", "y", "none"),
xintercept = 0,
yintercept = 0,
invert.selection = FALSE,
h = NULL,
n = NULL,
label.fill = "",
return.density = FALSE,
na.rm = TRUE,
show.legend = FALSE,
inherit.aes = TRUE
)
Arguments
mapping |
The aesthetic mapping, usually constructed with
|
data |
A layer specific dataset - only needed if you want to override the plot defaults. |
geom |
The geometric object to use display the data. |
position |
The position adjustment to use for overlapping points on this layer |
... |
other arguments passed on to |
keep.fraction |
numeric [0..1]. The fraction of the observations (or
rows) in |
keep.number |
integer Set the maximum number of observations to retain,
effective only if obeying |
keep.sparse |
logical If |
keep.these , exclude.these |
character vector, integer vector, logical
vector or function that takes one or more variables in data selected by
|
these.target |
character, numeric or logical selecting one or more
column(s) of |
pool.along |
character, one of |
xintercept , yintercept |
numeric The split points for the data filtering. |
invert.selection |
logical If |
h |
vector of bandwidths for x and y directions. Defaults to normal reference bandwidth (see bandwidth.nrd). A scalar value will be taken to apply to both directions. |
n |
Number of grid points in each direction. Can be scalar or a length-2 integer vector |
label.fill |
character vector of length 1, a function or |
return.density |
logical vector of lenght 1. If |
na.rm |
a logical value indicating whether NA values should be stripped before the computation proceeds. |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
Details
stat_dens2d_labels()
is designed to work together with
geometries from package 'ggrepel'. To avoid text labels being plotted over
unlabelled points all the rows in data need to be retained but
labels replaced with the empty character string, ""
. Function
stat_dens2d_filter
cannot be used with the repulsive geoms
from 'ggrepel' because it drops observations.
stat_dens2d_labels()
can be useful also in other situations, as the
substitution character string can be set by the user by passing an argument
to label.fill
. If this argument is NULL
the unselected rows
are filtered out identically as by stat_dens2d_filter
.
The local density of observations in 2D (x and y) is computed
with function kde2d
and used to select observations,
passing to the geom all the rows in its data
input but with with the
text of labels replaced in those "not kept". The default is to select
observations in sparse regions of the plot, but the selection can be
inverted so that only observations in the densest regions are returned.
Specific observations can be protected from having the label replaced by
passing a suitable argument to keep.these
. Logical and integer
vectors function as indexes to rows in data
, while a character
vector is compared to values in the variable mapped to the label
aesthetic. A function passed as argument to keep.these
will receive
as its first argument the values in the variable mapped to label
and
should return a character, logical or numeric vector as described above.
How many labels are retained intact in addition to those in
keep.these
is controlled with arguments passed to keep.number
and keep.fraction
. keep.number
sets the maximum number of
observations selected, whenever keep.fraction
results in fewer
observations selected, it is obeyed.
Computation of density and of the default bandwidth require at least
two observations with different values. If data do not fulfill this
condition, they are kept only if keep.fraction = 1
. This is correct
behavior for a single observation, but can be surprising in the case of
multiple observations.
Parameters keep.these
and exclude.these
make it possible to
force inclusion or exclusion of observations after the density is computed.
In case of conflict, exclude.these
overrides keep.these
.
Value
A plot layer instance. Using as output data
the input
data
after value substitution based on a 2D the filtering criterion.
Note
Which points are kept and which not depends on how dense a grid is used
and how flexible the density surface estimate is. This depends on the
values passed as arguments to parameters n
, bw
and
kernel
. It is also important to be aware that both
geom_text()
and geom_text_repel()
can avoid overplotting by
discarding labels at the plot rendering stage, i.e., what is plotted may
differ from what is returned by this statistic.
See Also
stat_dens2d_filter
and kde2d
used
internally. Parameters n
, h
in this statistic correspond to
the parameters with the same name in this imported function. Limits are set
to the limits of the plot scales.
Other statistics returning a subset of data:
stat_dens1d_filter()
,
stat_dens1d_labels()
,
stat_dens2d_filter()
Examples
random_string <-
function(len = 6) {
paste(sample(letters, len, replace = TRUE), collapse = "")
}
# Make random data.
set.seed(1001)
d <- tibble::tibble(
x = rnorm(100),
y = rnorm(100),
group = rep(c("A", "B"), c(50, 50)),
lab = replicate(100, { random_string() })
)
# using defaults
ggplot(data = d, aes(x, y, label = lab)) +
geom_point() +
stat_dens2d_labels()
ggplot(data = d, aes(x, y, label = lab)) +
geom_point() +
stat_dens2d_labels(keep.these = "zoujdg")
ggplot(data = d, aes(x, y, label = lab)) +
geom_point() +
stat_dens2d_labels(keep.these = function(x) {grepl("^z", x)})
ggplot(data = d, aes(x, y, label = lab)) +
geom_point() +
stat_dens2d_labels(geom = "text_s",
position = position_nudge_center(x = 0.1, y = 0.1,
center_x = mean,
center_y = mean),
vjust = "outward_mean", hjust = "outward_mean") +
expand_limits(x = c(-4, 4.5))
ggrepel.installed <- requireNamespace("ggrepel", quietly = TRUE)
if (ggrepel.installed) {
library(ggrepel)
ggplot(data = d, aes(x, y, label = lab, colour = group)) +
geom_point() +
stat_dens2d_labels(geom = "text_repel")
ggplot(data = d, aes(x, y, label = lab, colour = group)) +
geom_point() +
stat_dens2d_labels(geom = "text_repel", label.fill = NA)
# we keep labels starting with "a" across the whole plot, but all in sparse
# regions. To achieve this we pass as argument to label.fill a fucntion
# instead of a character string.
label.fun <- function(x) {ifelse(grepl("^a", x), x, "")}
ggplot(data = d, aes(x, y, label = lab, colour = group)) +
geom_point() +
stat_dens2d_labels(geom = "text_repel", label.fill = label.fun)
}
# Using geom_debug() we can see that all 100 rows in \code{d} are
# returned. But only those labelled in the previous example still contain
# the original labels.
gginnards.installed <- requireNamespace("gginnards", quietly = TRUE)
if (gginnards.installed) {
library(gginnards)
ggplot(data = d, aes(x, y, label = lab)) +
geom_point() +
stat_dens2d_labels(geom = "debug")
ggplot(data = d, aes(x, y, label = lab)) +
geom_point() +
stat_dens2d_labels(geom = "debug", return.density = TRUE)
ggplot(data = d, aes(x, y, label = lab)) +
geom_point() +
stat_dens2d_labels(geom = "debug", label.fill = NULL)
ggplot(data = d, aes(x, y, label = lab)) +
geom_point() +
stat_dens2d_labels(geom = "debug", label.fill = FALSE, return.density = TRUE)
ggplot(data = d, aes(x, y, label = lab)) +
geom_point() +
stat_dens2d_labels(geom = "debug", label.fill = NULL, return.density = TRUE)
ggplot(data = d, aes(x, y)) +
geom_point() +
stat_dens2d_labels(geom = "debug")
}