muir {muir} | R Documentation |
Explore Datasets with Trees
Description
This function allows users to easily and dynamically explore or document a data.frame using a tree data structure. Columns of interest in the data.frame can be provided to the function, as well as critieria for how they should be represented in discrete nodes, to generate a data tree representing those columns and filters.
Usage
muir(data, node.levels, node.limit = 3, level.criteria = NULL,
label.vals = NULL, tree.dir = "LR", show.percent = TRUE,
num.precision = 2, show.empty.child = FALSE, tree.height = -1,
tree.width = -1)
Arguments
data |
A data.frame to be explored using trees |
node.levels |
A character vector of columns from For each column, the user can add a suffix to the columnn name to indicate whether to generate
nodes for all distinct values of the column in the date.frame, a specific number of values
(i.e., the "Top (n)" values), and whether or not to aggregate remaining values into a separate
"Other" node, or to use user-provided filter criteria for the column as provided in
the Values can be provided as "colname", "colname:*", "colname:3", "colname:+", or "colname:*+". The separator character ":" and the special characters in the suffix that follow (as outlined below) indicate which approach to take for each column.
|
node.limit |
Numeric value. When providing a column in |
level.criteria |
A data.frame consisting of 4 character columns containing
column names (matching – without suffixes – the columns in E.g.,"wt, ">=", "4000", "Heavy Cars" |
label.vals |
Character vector of additional values to include in the node provided as a
character vector. The values must take the form of dplyr |
tree.dir |
Character. The direction the tree graph should be rendered. Defaults to "LR"
|
show.percent |
Logical. Should nodes show the percent of records represented by
that node compared to the total number of records in |
num.precision |
Number of digits to print numeric label values out to |
show.empty.child |
Logical. Show a balanced tree with children nodes that are all empty or stop expanding the tree once there is a parent node that is empty. Defaults to FALSE – don't show empty children nodes |
tree.height |
Numeric. Control tree height to zoom in/out on nodes. Passed to DiagrammeR
as |
tree.width |
Numberic. Control tree width to zoom in/out on nodes. Passed to DiagrammeR
as |
Value
An object of class htmlwidget
(via DiagrammeR) that will
intelligently print itself into HTML in a variety of contexts
including the R console, within R Markdown documents,
and within Shiny output bindings.
Examples
## Not run:
# Load in the 'mtcars' dataset
data(mtcars)
# Basic exploration - show all values
mtTree <- muir(data = mtcars, node.levels = c("cyl:*", "carb:*"))
mtTree
# Basic exploration - show all values overriding default node.limit
mtTree <- muir(data = mtcars, node.levels = c("cyl:*", "carb:*"), node.limit = 5)
mtTree
# Show all values overriding default node.limit differently for each column
mtTree <- muir(data = mtcars, node.levels = c("cyl:2", "carb:5"))
mtTree
# Show all values overriding default node.limit for each column
# and aggregating all distinct values above the node.limit into a
# separate "Other" column to collect remaining values
# Top 2 occurring 'carb' values will be returned in their own nodes,
# remaining values/counts will be aggregated into a separate "Other" node
mtTree <- muir(data = mtcars, node.levels = c("cyl:2", "carb:2+"))
mtTree
# Add additional calculations to each node output (dplyr::summarise functions)
mtTree <- muir(data = mtcars, node.levels = c("cyl:2", "carb:2+"),
label.vals = c("min(wt)", "max(wt)"))
mtTree
# Make new label values more reader-friendly
mtTree <- muir(data = mtcars, node.levels = c("cyl:2", "carb:2+"),
label.vals = c("min(wt):Min Weight", "max(wt):Max Weight"))
mtTree
# Instead of just returning top counts for columns provided in \code{node.levels},
# provide custom filter criteria and custom node titles in \code{label.vals}
# (criteria could also be read in from a csv file as a data.frame)
criteria <- data.frame(col = c("cyl", "cyl", "carb"),
oper = c("<", ">=", "=="),
val = c(4, 4, 2),
title = c("Less Than 4 Cylinders", "4 or More Cylinders", "2 Carburetors"))
mtTree <- muir(data = mtcars, node.levels = c("cyl", "carb"),
level.criteria = criteria,
label.vals = c("min(wt):Min Weight", "max(wt):Max Weight"))
mtTree
# Use same criteria but show all other values for the column where NOT
# EQUAL to the combination of the filters provided for that column (e.g., for cyl
# where !(cyl < 4 | cyl >= 4) in an "Other" node
mtTree <- muir(data = mtcars, node.levels = c("cyl:+", "carb:+"),
level.criteria = criteria,
label.vals = c("min(wt):Min Weight", "max(wt):Max Weight"))
mtTree
# Show empty child nodes (balanced tree)
mtTree <- muir(data = mtcars, node.levels = c("cyl:+", "carb:+"),
level.criteria = criteria,
label.vals = c("min(wt):Min Weight", "max(wt):Max Weight"),
show.empty.child = TRUE)
mtTree
# Save tree to HTML file with \code{htmlwidgets} package to working directory
mtTree <- muir(data = mtcars, node.levels = c("cyl:2", "carb:2+"))
htmlwidgets::saveWidget(mtTree, "mtTree.html")
## End(Not run)