alluvial-data {ggalluvial} | R Documentation |
Check for alluvial structure and convert between alluvial formats
Description
Alluvial plots consist of multiple horizontally-distributed columns (axes) representing factor variables, vertical divisions (strata) of these axes representing these variables' values; and splines (alluvial flows) connecting vertical subdivisions (lodes) within strata of adjacent axes representing subsets or amounts of observations that take the corresponding values of the corresponding variables. This function checks a data frame for either of two types of alluvial structure:
Usage
is_lodes_form(
data,
key,
value,
id,
weight = NULL,
site = NULL,
logical = TRUE,
silent = FALSE
)
is_alluvia_form(
data,
...,
axes = NULL,
weight = NULL,
logical = TRUE,
silent = FALSE
)
to_lodes_form(
data,
...,
axes = NULL,
key = "x",
value = "stratum",
id = "alluvium",
diffuse = FALSE,
discern = FALSE
)
to_alluvia_form(data, key, value, id, distill = FALSE)
Arguments
data |
A data frame. |
key , value , id |
In |
weight |
Optional field of |
site |
Optional vector of fields of |
logical |
Defunct. Whether to return a logical value or a character string indicating the type of alluvial structure ("none", "lodes", or "alluvia"). |
silent |
Whether to print messages. |
... |
Used in |
axes |
In |
diffuse |
Fields of |
discern |
Logical value indicating whether to suffix values of the
variables used as axes that appear at more than one variable in order to
distinguish their factor levels. This forces the levels of the combined
factor variable |
distill |
A logical value indicating whether to include variables, other
than those passed to |
Details
One row per lode, wherein each row encodes a subset or amount of observations having a specific profile of axis values, a
key
field encodes the axis, avalue
field encodes the value within each axis, and aid
column identifies multiple lodes corresponding to the same subset or amount of observations.is_lodes_form
tests for this structure.One row per alluvium, wherein each row encodes a subset or amount of observations having a specific profile of axis values and a set
axes
of fields encodes its values at each axis variable.is_alluvia_form
tests for this structure.
to_lodes_form
takes a data frame with several designated variables to
be used as axes in an alluvial plot, and reshapes the data frame so that
the axis variable names constitute a new factor variable and their values
comprise another. Other variables' values will be repeated, and a
row-grouping variable can be introduced. This function invokes
tidyr::gather()
.
to_alluvia_form
takes a data frame with axis and axis value variables
to be used in an alluvial plot, and reshape the data frame so that the
axes constitute separate variables whose values are given by the value
variable. This function invokes tidyr::spread()
.
See Also
Other alluvial data manipulation:
self-adjoin
Examples
# Titanic data in alluvia format
titanic_alluvia <- as.data.frame(Titanic)
head(titanic_alluvia)
is_alluvia_form(titanic_alluvia,
weight = "Freq")
# Titanic data in lodes format
titanic_lodes <- to_lodes_form(titanic_alluvia,
key = "x", value = "stratum", id = "alluvium",
axes = 1:4)
head(titanic_lodes)
is_lodes_form(titanic_lodes,
key = "x", value = "stratum", id = "alluvium",
weight = "Freq")
# again in lodes format, this time diffusing the `Class` variable
titanic_lodes2 <- to_lodes_form(titanic_alluvia,
key = variable, value = value,
id = cohort,
1:3, diffuse = Class)
head(titanic_lodes2)
is_lodes_form(titanic_lodes2,
key = variable, value = value, id = cohort,
weight = Freq)
# use `site` to separate data before lode testing
is_lodes_form(titanic_lodes2,
key = variable, value = value, id = Class,
weight = Freq)
is_lodes_form(titanic_lodes2,
key = variable, value = value, id = Class,
weight = Freq, site = cohort)
# curriculum data in lodes format
data(majors)
head(majors)
is_lodes_form(majors,
key = "semester", value = "curriculum", id = "student")
# curriculum data in alluvia format
majors_alluvia <- to_alluvia_form(majors,
key = "semester", value = "curriculum",
id = "student")
head(majors_alluvia)
is_alluvia_form(majors_alluvia, tidyselect::starts_with("CURR"))
# distill variables that vary within `id` values
set.seed(1)
majors$hypo_grade <- LETTERS[sample(5, size = nrow(majors), replace = TRUE)]
majors_alluvia2 <- to_alluvia_form(majors,
key = "semester", value = "curriculum",
id = "student",
distill = "most")
head(majors_alluvia2)
# options to distinguish strata at different axes
gg <- ggplot(majors_alluvia,
aes(axis1 = CURR1, axis2 = CURR7, axis3 = CURR13))
gg +
geom_alluvium(aes(fill = as.factor(student)), width = 2/5, discern = TRUE) +
geom_stratum(width = 2/5, discern = TRUE) +
geom_text(stat = "stratum", discern = TRUE, aes(label = after_stat(stratum)))
gg +
geom_alluvium(aes(fill = as.factor(student)), width = 2/5, discern = FALSE) +
geom_stratum(width = 2/5, discern = FALSE) +
geom_text(stat = "stratum", discern = FALSE, aes(label = after_stat(stratum)))
# warning when inappropriate
ggplot(majors[majors$semester %in% paste0("CURR", c(1, 7, 13)), ],
aes(x = semester, stratum = curriculum, alluvium = student,
label = curriculum)) +
geom_alluvium(aes(fill = as.factor(student)), width = 2/5, discern = TRUE) +
geom_stratum(width = 2/5, discern = TRUE) +
geom_text(stat = "stratum", discern = TRUE)