stat_correlation {ggpmisc} | R Documentation |
Annotate plot with correlation test
Description
stat_correlation()
applies stats::cor.test()
respecting grouping with method = "pearson"
default but
alternatively using "kendall"
or "spearman"
methods. It
generates labels for correlation coefficients and p-value, coefficient of
determination (R^2) for method "pearson" and number of observations.
Usage
stat_correlation(
mapping = NULL,
data = NULL,
geom = "text_npc",
position = "identity",
...,
method = "pearson",
n.min = 2L,
alternative = "two.sided",
exact = NULL,
r.conf.level = ifelse(method == "pearson", 0.95, NA),
continuity = FALSE,
small.r = getOption("ggpmisc.small.r", default = FALSE),
small.p = getOption("ggpmisc.small.p", default = FALSE),
coef.keep.zeros = TRUE,
r.digits = 2,
t.digits = 3,
p.digits = 3,
CI.brackets = c("[", "]"),
label.x = "left",
label.y = "top",
hstep = 0,
vstep = NULL,
output.type = NULL,
boot.R = ifelse(method == "pearson", 0, 999),
na.rm = FALSE,
parse = NULL,
show.legend = FALSE,
inherit.aes = TRUE
)
Arguments
mapping |
The aesthetic mapping, usually constructed with
|
data |
A layer specific dataset, only needed if you want to override the plot defaults. |
geom |
The geometric object to use display the data |
position |
The position adjustment to use for overlapping points on this layer |
... |
other arguments passed on to |
method |
character One of "pearson", "kendall" or "spearman". |
n.min |
integer Minimum number of distinct values in the variables for fitting to the attempted. |
alternative |
character One of "two.sided", "less" or "greater". |
exact |
logical Whether an exact p-value should be computed. Used for Kendall's tau and Spearman's rho. |
r.conf.level |
numeric Confidence level for the returned confidence
interval. If set to |
continuity |
logical If TRUE , a continuity correction is used for Kendall's tau and Spearman's rho when not computed exactly. |
small.r , small.p |
logical Flags to switch use of lower case r and p for
coefficient of correlation (only for |
coef.keep.zeros |
logical Keep or drop trailing zeros when formatting the correlation coefficients and t-value, z-value or S-value (see note below). |
r.digits , t.digits , p.digits |
integer Number of digits after the decimal
point to use for R, r.squared, tau or rho and P-value in labels. If
|
CI.brackets |
character vector of length 2. The opening and closing brackets used for the CI label. |
label.x , label.y |
|
hstep , vstep |
numeric in npc units, the horizontal and vertical displacement step-size used between labels for different groups. |
output.type |
character One of "expression", "LaTeX", "text", "markdown" or "numeric". |
boot.R |
interger The number of bootstrap resamples. Set to zero for no bootstrap estimates for the CI. |
na.rm |
a logical indicating whether NA values should be stripped before the computation proceeds. |
parse |
logical Passed to the geom. If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
Details
This statistic can be used to annotate a plot with the correlation coefficient and the outcome of its test of significance. It supports Pearson, Kendall and Spearman methods to compute correlation. This statistic generates labels as R expressions by default but LaTeX (use TikZ device), markdown (use package 'ggtext') and plain text are also supported, as well as numeric values for user-generated text labels. The character labels include the symbol describing the quantity together with the numeric value. For the confidence interval (CI) the default is to follow the APA recommendation of using square brackets.
The value of parse
is set automatically based on output-type
,
but if you assemble labels that need parsing from numeric
output,
the default needs to be overridden. By default the value of
output.type
is guessed from the name of the geometry.
A ggplot statistic receives as data
a data frame that is not the one
passed as argument by the user, but instead a data frame with the variables
mapped to aesthetics. cor.test()
is always applied to the variables
mapped to the x
and y
aesthetics, so the scales used for
x
and y
should both be continuous scales rather than
discrete.
Aesthetics
stat_correaltion()
requires x
and
y
. In addition, the aesthetics understood by the geom
("text"
is the default) are understood and grouping respected.
Computed variables
If output.type is "numeric"
the returned
tibble contains the columns listed below with variations depending on the
method
. If the model fit function used does not return a value, the
variable is set to NA_real_
.
- x,npcx
x position
- y,npcy
y position
- r, and cor, tau or rho
numeric values for correlation coefficient estimates
- t.value and its df, z.value or S.value
numeric values for statistic estimates
- p.value, n
numeric values.
- r.conf.level
numeric value, as fraction of one.
- r.confint.low
Confidence interval limit for
r
.- r.confint.high
Confidence interval limit for
r
.- grp.label
Set according to mapping in
aes
.- method.label
Set according
method
used.- method, test
character values
If output.type different from "numeric"
the returned tibble contains
in addition to the columns listed above those listed below. If the numeric
value is missing the label is set to character(0L)
.
- r.label, and cor.label, tau.label or rho.label
Correlation coefficient as a character string.
- t.value.label, z.value.label or S.value.label
t-value and degrees of freedom, z-value or S-value as a character string.
- p.value.label
P-value for test against zero, as a character string.
- r.confint.label, and cor.conint.label, tau.confint.label or rho.confint.label
Confidence interval for
r
(only withmethod = "pearson"
).- n.label
Number of observations used in the fit, as a character string.
- grp.label
Set according to mapping in
aes
, as a character string.
To explore the computed values returned for a given input we suggest the use
of geom_debug
as shown in the last examples below.
Note
Currently coef.keep.zeros
is ignored, with trailing zeros always
retained in the labels but not protected from being dropped by R when
character strings are parsed into expressions.
See Also
cor.test
for details on the computations.
Examples
# generate artificial data
set.seed(4321)
x <- (1:100) / 10
y <- x + rnorm(length(x))
my.data <- data.frame(x = x,
y = y,
y.desc = - y,
group = c("A", "B"))
# by default only R is displayed
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_correlation()
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_correlation(small.r = TRUE)
ggplot(my.data, aes(x, y.desc)) +
geom_point() +
stat_correlation(label.x = "right")
# non-default methods
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_correlation(method = "kendall")
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_correlation(method = "spearman")
# use_label() can map a user selected label
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_correlation(use_label("R2"))
# use_label() can assemble and map a combined label
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_correlation(use_label("R", "P", "n", "method"))
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_correlation(use_label("R", "R.CI"))
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_correlation(use_label("R", "R.CI"),
r.conf.level = 0.95)
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_correlation(use_label("R", "R.CI"),
method = "kendall",
r.conf.level = 0.95)
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_correlation(use_label("R", "R.CI"),
method = "spearman",
r.conf.level = 0.95)
# manually assemble and map a specific label using paste() and aes()
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_correlation(aes(label = paste(after_stat(r.label),
after_stat(p.value.label),
after_stat(n.label),
sep = "*\", \"*")))
# manually format and map a specific label using sprintf() and aes()
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_correlation(aes(label = sprintf("%s*\" with \"*%s*\" for \"*%s",
after_stat(r.label),
after_stat(p.value.label),
after_stat(t.value.label))))
# Inspecting the returned data using geom_debug()
# This provides a quick way of finding out the names of the variables that
# are available for mapping to aesthetics with after_stat().
gginnards.installed <- requireNamespace("gginnards", quietly = TRUE)
if (gginnards.installed)
library(gginnards)
# the whole of computed data
if (gginnards.installed)
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_correlation(geom = "debug")
if (gginnards.installed)
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_correlation(geom = "debug", method = "pearson")
if (gginnards.installed)
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_correlation(geom = "debug", method = "kendall")
if (gginnards.installed)
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_correlation(geom = "debug", method = "spearman")
if (gginnards.installed)
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_correlation(geom = "debug", output.type = "numeric")
if (gginnards.installed)
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_correlation(geom = "debug", output.type = "markdown")
if (gginnards.installed)
ggplot(my.data, aes(x, y)) +
geom_point() +
stat_correlation(geom = "debug", output.type = "LaTeX")