order_multiway {midfieldr} | R Documentation |
Order categorical variables of multiway data
Description
Transform a data frame such that two independent categorical variables are factors with levels ordered for display in a multiway dot plot. Multiway data comprise a single quantitative value (or response) for every combination of levels of two categorical variables. The ordering of the rows and panels is crucial to the perception of effects (Cleveland, 1993).
Usage
order_multiway(
dframe,
quantity,
categories,
...,
method = NULL,
ratio_of = NULL
)
Arguments
dframe |
Data frame containing a single quantitative value (or response) for every combination of levels of two categorical variables. Categories may be class character or factor. Two additional numeric columns are required when using the "percent" ordering method. |
quantity |
Character, name (in quotes) of the single multiway quantitative variable |
categories |
Character, vector of names (in quotes) of the two multiway categorical variables |
... |
Not used for passing values; forces subsequent arguments to be referable only by name. |
method |
Character, “median” (default) or “percent”, method of ordering the levels of the categories. The median method computes the medians of the quantitative column grouped by category. The percent method computes percentages based on the same ratio underlying the quantitative percentage variable except grouped by category. |
ratio_of |
Character vector with the names (in quotes) of the
numerator and denominator columns that produced the quantitative
variable, required when |
Details
In our context, "multiway" refers to the data structure and graph design defined by Cleveland (1993), not to the methods of analysis described by Kroonenberg (2008).
Multiway data comprise three variables: a categorical variable of m levels; a second independent categorical variable of n levels; and a quantitative variable (or response) of length mn that cross-classifies the categories, that is, there is a value of the response for each combination of levels of the two categorical variables.
In a multiway dot plot, one category is encoded by the panels, the second category is encoded by the rows of each panel, and the quantitative variable is encoded along identical horizontal scales.
Value
A data frame in data.table
format with
the following properties: rows are preserved; columns specified by
categories
are converted to factors and ordered; the column specified
by quantity
is converted to type double; other columns are preserved
with the exception that columns added by the function overwrite existing
columns of the same name (if any); grouping structures are not preserved.
The added columns are:
CATEGORY_median
columns (when ordering method is "median")-
Numeric. Two columns of medians of the quantitative variable grouped by the categorical variables. The
CATEGORY
placeholder in the column name is replaced by a category name from thecategories
argument. For example, supposecategories = c("program", "people")
andmethod = "median"
. The two new column names would beprogram_median
andpeople_median.
CATEGORY_QUANTITY
columns (when ordering method is "percent")-
Numeric. Two columns of percentages based on the same ratio that produces the quantitative variable except grouped by the categorical variables. The
CATEGORY
placeholder in the column name is replaced by a category name from thecategories
argument; theQUANTITY
placeholder is replaced by the quantitative variable name in thequantity
argument. For example, supposecategories = c("program", "people")
, andquantity = "grad_rate"
, andmethod = "percent"
. The two new column names would beprogram_grad_rate
andpeople_grad_rate.
References
Cleveland WS (1993). Visualizing Data. Hobart Press, Summit, NJ.
Kroonenberg PM (2008). Applied Multiway Data Analysis. Wiley, Hoboken, NJ.
Examples
# Subset of built-in data set
dframe <- study_results[program == "EE" | program == "ME"]
dframe[, people := paste(race, sex)]
dframe[, c("race", "sex") := NULL]
data.table::setcolorder(dframe, c("program", "people"))
# Class before ordering
class(dframe$program)
class(dframe$people)
# Class and levels after ordering
mw1 <- order_multiway(dframe,
quantity = "stickiness",
categories = c("program", "people"))
class(mw1$program)
levels(mw1$program)
class(mw1$people)
levels(mw1$people)
# Display category medians
mw1
# Existing factors (if any) are re-ordered
mw2 <- dframe
mw2$program <- factor(mw2$program, levels = c("ME", "EE"))
# Levels before conditioning
levels(mw2$program)
# Levels after conditioning
mw2 <- order_multiway(dframe,
quantity = "stickiness",
categories = c("program", "people"))
levels(mw2$program)
# Ordering using percent method
order_multiway(dframe,
quantity = "stickiness",
categories = c("program", "people"),
method = "percent",
ratio_of = c("graduates", "ever_enrolled"))