order_multiway {midfieldr}R Documentation

Order categorical variables of multiway data

Description

Transform a data frame such that two independent categorical variables are factors with levels ordered for display in a multiway dot plot. Multiway data comprise a single quantitative value (or response) for every combination of levels of two categorical variables. The ordering of the rows and panels is crucial to the perception of effects (Cleveland, 1993).

Usage

order_multiway(
  dframe,
  quantity,
  categories,
  ...,
  method = NULL,
  ratio_of = NULL
)

Arguments

dframe

Data frame containing a single quantitative value (or response) for every combination of levels of two categorical variables. Categories may be class character or factor. Two additional numeric columns are required when using the "percent" ordering method.

quantity

Character, name (in quotes) of the single multiway quantitative variable

categories

Character, vector of names (in quotes) of the two multiway categorical variables

...

Not used for passing values; forces subsequent arguments to be referable only by name.

method

Character, “median” (default) or “percent”, method of ordering the levels of the categories. The median method computes the medians of the quantitative column grouped by category. The percent method computes percentages based on the same ratio underlying the quantitative percentage variable except grouped by category.

ratio_of

Character vector with the names (in quotes) of the numerator and denominator columns that produced the quantitative variable, required when method is "percent". Names can be in any order; the algorithm assumes that the parameter with the larger column sum is the denominator of the ratio.

Details

In our context, "multiway" refers to the data structure and graph design defined by Cleveland (1993), not to the methods of analysis described by Kroonenberg (2008).

Multiway data comprise three variables: a categorical variable of m levels; a second independent categorical variable of n levels; and a quantitative variable (or response) of length mn that cross-classifies the categories, that is, there is a value of the response for each combination of levels of the two categorical variables.

In a multiway dot plot, one category is encoded by the panels, the second category is encoded by the rows of each panel, and the quantitative variable is encoded along identical horizontal scales.

Value

A data frame in data.table format with the following properties: rows are preserved; columns specified by categories are converted to factors and ordered; the column specified by quantity is converted to type double; other columns are preserved with the exception that columns added by the function overwrite existing columns of the same name (if any); grouping structures are not preserved. The added columns are:

CATEGORY_median columns (when ordering method is "median")

Numeric. Two columns of medians of the quantitative variable grouped by the categorical variables. The CATEGORY placeholder in the column name is replaced by a category name from the categories argument. For example, suppose categories = c("program", "people") and method = "median". The two new column names would be program_median and people_median.

CATEGORY_QUANTITY columns (when ordering method is "percent")

Numeric. Two columns of percentages based on the same ratio that produces the quantitative variable except grouped by the categorical variables. The CATEGORY placeholder in the column name is replaced by a category name from the categories argument; the QUANTITY placeholder is replaced by the quantitative variable name in the quantity argument. For example, suppose categories = c("program", "people"), and quantity = "grad_rate", and method = "percent". The two new column names would be program_grad_rate and people_grad_rate.

References

Cleveland WS (1993). Visualizing Data. Hobart Press, Summit, NJ.

Kroonenberg PM (2008). Applied Multiway Data Analysis. Wiley, Hoboken, NJ.

Examples

# Subset of built-in data set
dframe <- study_results[program == "EE" | program == "ME"]
dframe[, people := paste(race, sex)]
dframe[, c("race", "sex") := NULL]
data.table::setcolorder(dframe, c("program", "people"))

# Class before ordering
class(dframe$program)
class(dframe$people)

# Class and levels after ordering
mw1 <- order_multiway(dframe, 
                      quantity = "stickiness", 
                      categories = c("program", "people"))
class(mw1$program)
levels(mw1$program)
class(mw1$people)
levels(mw1$people)

# Display category medians 
mw1

# Existing factors (if any) are re-ordered
mw2 <- dframe
mw2$program <- factor(mw2$program, levels = c("ME", "EE"))

# Levels before conditioning
levels(mw2$program) 

# Levels after conditioning
mw2 <- order_multiway(dframe, 
                      quantity = "stickiness", 
                      categories = c("program", "people"))
levels(mw2$program) 

# Ordering using percent method
order_multiway(dframe, 
               quantity = "stickiness", 
               categories = c("program", "people"), 
               method = "percent", 
               ratio_of = c("graduates", "ever_enrolled"))

[Package midfieldr version 1.0.2 Index]