geom_lv {lvplot} | R Documentation |
Side-by-side LV boxplots with ggplot2.
Description
An extension of standard boxplots which draws k letter statistics.
Conventional boxplots (Tukey 1977) are useful displays for conveying rough
information about the central 50% of the data and the extent of the data.
For moderate-sized data sets (n < 1000
), detailed estimates of tail
behavior beyond the quartiles may not be trustworthy, so the information
provided by boxplots is appropriately somewhat vague beyond the quartiles,
and the expected number of “outliers” and “far-out” values for a
Gaussian sample of size n
is often less than 10 (Hoaglin, Iglewicz,
and Tukey 1986). Large data sets (n \approx 10,000-100,000
) afford
more precise estimates of quantiles in the tails beyond the quartiles and
also can be expected to present a large number of “outliers” (about
0.4 + 0.007 n
).
The letter-value box plot addresses both these shortcomings: it conveys
more detailed information in the tails using letter values, only out to the
depths where the letter values are reliable estimates of their
corresponding quantiles (corresponding to tail areas of roughly
2^{-i}
); “outliers” are defined as a function of the most extreme
letter value shown. All aspects shown on the letter-value boxplot are
actual observations, thus remaining faithful to the principles that
governed Tukey's original boxplot.
Usage
geom_lv(
mapping = NULL,
data = NULL,
stat = "lv",
position = "dodge",
outlier.colour = "black",
outlier.shape = 19,
outlier.size = 1.5,
outlier.stroke = 0.5,
na.rm = TRUE,
varwidth = FALSE,
width.method = "linear",
show.legend = NA,
inherit.aes = TRUE,
...
)
GeomLv
scale_fill_lv(...)
stat_lv(
mapping = NULL,
data = NULL,
geom = "lv",
position = "dodge",
na.rm = TRUE,
conf = 0.95,
percent = NULL,
k = NULL,
show.legend = NA,
inherit.aes = TRUE,
...
)
StatLv
Arguments
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
position |
Position adjustment, either as a string naming the adjustment
(e.g. |
outlier.colour |
Override aesthetics used for the outliers. Defaults
come from |
outlier.shape |
Override aesthetics used for the outliers. Defaults
come from |
outlier.size |
Override aesthetics used for the outliers. Defaults
come from |
outlier.stroke |
Override aesthetics used for the outliers. Defaults
come from |
na.rm |
If |
varwidth |
if |
width.method |
character, one of 'linear' (default), 'area', or 'height'. This parameter determines whether the width of the box for letter value LV(i) should be proportional to i (linear), proportional to $2^-i$ (height), or whether the area of the box should be proportional to $2^-i$ (area). |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
... |
Other arguments passed on to |
geom , stat |
Use to override the default connection between
|
conf |
confidence level |
percent |
numeric value: percent of data in outliers |
k |
number of letter values shown |
Format
An object of class GeomLv
(inherits from Geom
, ggproto
, gg
) of length 6.
An object of class StatLv
(inherits from Stat
, ggproto
, gg
) of length 5.
Computed/reported variables
- k
Number of Letter Values used for the display
- LV
Name of the Letter Value
- width
width of the interquartile box
References
McGill, R., Tukey, J. W. and Larsen, W. A. (1978) Variations of box plots. The American Statistician 32, 12-16.
See Also
stat_quantile
to view quantiles conditioned on a
continuous variable.
Examples
library(ggplot2)
p <- ggplot(mpg, aes(class, hwy))
p + geom_lv(aes(fill = after_stat(LV))) + scale_fill_brewer()
p + geom_lv() + geom_jitter(width = 0.2)
p + geom_lv(aes(fill = after_stat(LV))) + scale_fill_lv()
# Outliers
p + geom_lv(varwidth = TRUE, aes(fill = after_stat(LV))) + scale_fill_lv()
p + geom_lv(fill = "grey80", colour = "black")
p + geom_lv(outlier.colour = "red", outlier.shape = 1)
# Plots are automatically dodged when any aesthetic is a factor
p + geom_lv(aes(fill = drv))
# varwidth adjusts the width of the boxes according to the number of observations
ggplot(ontime, aes(UniqueCarrier, TaxiIn + TaxiOut)) +
geom_lv(aes(fill = after_stat(LV)), varwidth=TRUE) +
scale_fill_lv() +
scale_y_sqrt() +
theme_bw()
ontime$DayOfWeek <- as.POSIXlt(ontime$FlightDate)$wday
ggplot(ontime, aes(factor(DayOfWeek), TaxiIn + TaxiOut)) +
geom_lv(aes(fill = after_stat(LV))) +
scale_fill_lv() +
scale_y_sqrt() +
theme_bw()