BarChart {lessR} | R Documentation |
Bar Chart for One or Two Variables
Description
Abbreviation: bc
The function plots a bar chart, one categorical variable, x
, against one numeric variable, y
, possibly including an optional second categorical variable, by
. The bar chart is constructed from the usually relatively brief table that pairs each level of the categorical variables with the corresponding numerical value of y
. Usually, this table is a summary (pivot) table calculated as a data aggregation from the original data table of measurements, such as average salary of the employees in each department.
The calculation of this foundational summary table from which the bar chart is created can occur outside of the function. Or, probably the more usual situation, the table is implicitly calculated by the function in one of two ways. Accordingly, obtain the summary table from one of three possibilities.
Enter the summary table obtained from an external source directly as the value of the
data
parameter, indicated by specifying categorical variablesx
and possiblyby
with the numerical variabley
.Have the function implicitly summarize the entire data table. If only categorical variable
x
, and possibly categorical variableby
, are specified without a value of numericaly
, the entire data table must be input as the value ofdata
. The function then computes numeric variabley
as the computed frequency of values in each category or level of the specified categorical variables.Have the function implicitly summarize the entire data table entered as the value of
data
by specifying ay
variable. Obtain the summary table from which the bar chart is computed by summarizing (aggregating) the value ofy
at each level ofx
, and possiblyby
, with the chosen statistic specified by thestat
parameter. The function will assess if the input data is a summary table or the entire data table. If the entire data table is entered, and thestat
parameter is not entered, the value ofstat
defaults to the mean.
The function also displays the foundational summary table, such as frequency table for one or two variables. If a frequency table, also displayed are Cramer's V association, and the corresponding chi-square inferential analysis. For two variables, the frequencies include the joint and marginal frequencies. To activate Trellis graphics or facets, a multi-panel display, specify a by1
variable in place of by
for the second categorical variable. If the provided object to analyze is a set of multiple variables, including the name of an entire data frame, then a bar chart is calculated for each non-numeric variable in the data frame.
Usage
BarChart(
# ------------------------------------------
# Data from which to construct the bar chart
x=NULL, y=NULL, by=NULL, data=d, filter=NULL,
# ------------------------------
# Bar chart from aggregated data
stat=c("mean", "sum", "sd", "deviation", "min", "median", "max"),
stat_x=c("count", "proportion"),
# --------------------------------------------------
# Trellis (facet) plot, stratify on different panels
by1=NULL, n_row=NULL, n_col=NULL, aspect="fill",
# -------------------------------
# Layout and ordering of the bars
horiz=FALSE, sort=c("0", "-", "+"),
beside=FALSE, stack100=FALSE,
gap=NULL, scale_y=NULL, one_plot=NULL,
# ----------------------------------------------------------------
# Analogy of physical Marks on paper to create the bars and labels
theme=getOption("theme"),
fill=NULL,
color=getOption("bar_color_discrete"),
transparency=getOption("trans_bar_fill"),
fill_split=NULL,
labels=c("%", "input", "off"),
labels_color=getOption("labels_color"),
labels_size=getOption("labels_size"),
labels_decimals=getOption("labels_decimals"),
labels_position=getOption("labels_position"),
labels_cut=NULL,
# ------------------------------------------------------------------
# Labels for axes, values, and legend if x and by variables, margins
xlab=NULL, ylab=NULL, main=NULL, sub=NULL,
lab_adjust=c(0,0), margin_adjust=c(0,0,0,0),
pad_y_min=0, pad_y_max=0,
rotate_x=getOption("rotate_x"), break_x=NULL,
offset=getOption("offset"),
label_max=100,
legend_title=NULL, legend_position="right_margin",
legend_labels=NULL, legend_horiz=FALSE,
legend_size=NULL, legend_abbrev=NULL, legend_adjust=0,
# ----------------------------------------------------
# Draw one or more objects, text, or geometric figures
add=NULL, x1=NULL, y1=NULL, x2=NULL, y2=NULL,
# --------------------------------------------------------------------
# Output: text or chart turned off, to PDF file, number decimal digits
quiet=getOption("quiet"), do_plot=TRUE,
pdf_file=NULL, width=6.5, height=6,
digits_d=NULL, out_size=80,
# --------------------------------------
# Deprecated, removed in future versions
n_cat=getOption("n_cat"), value_labels=NULL, rows=NULL,
# -------------
# Miscellaneous
eval_df=NULL, ...)
bc(...)
Arguments
x |
Categorical variable(s) to analyze. Can be a single
variable, either
within a data frame or as a vector in the users workspace,
or multiple variables in a data frame such as designated with the
|
y |
Numeric variable with a value for each level of the categorical
variable with the value plotted proportional to the height of the
corresponding bar. If specified for the original data table, then
the corresponding
|
by |
A second categorical variable to create a two-variable bar chart for
each level of the numeric primary variable
|
data |
Optional data frame that contains the variables of interest.
Can contain data from which frequencies or other statistics for a
|
filter |
A logical expression that specifies a subset of rows of the data frame to analyze. |
stat |
Statistical transformation of the data for the y-axis across
groups defined by the categorical variable(s), the data aggregation.
Applicable values: |
stat_x |
When no |
by1 |
A categorical variable called a conditioning variable that
activates Trellis graphics (facets), from the |
n_row |
Optional specification for the number of rows in the layout
of a multi-panel display with Trellis graphics (facets). Need not specify
|
n_col |
Optional specification for the number of columns in the
layout of a multi-panel display with
Trellis graphics (facets). Need not specify |
aspect |
Lattice parameter for the aspect ratio of the panels in
a Trellis plot (multi-panel display or facets), defined as height divided by
width. The default value is |
horiz |
Bar orientation. By default the value is
|
sort |
Sort the categories by their frequency for one variable and by
the column sums if a |
beside |
For a two variable plot, set to |
stack100 |
100% stacked bar chart when a |
gap |
Gap between bars. Provides the value of the |
scale_y |
If specified, a vector of three values that define the numerical values of the y-axis, the numerical axis, within the bounds of plot region: starting value, ending value, and number of intervals. |
one_plot |
For bar charts of multiple x-variables, indicates
if a bar plot is produced for each x-variable, or all are combined
into a single plot, such as for items that all share common responses
such as survey data with a common Likert scale across variables.
Default is if variables share a common response scale
set to |
theme |
Theme for the colors for this analysis. Make persistent
across analyses with |
fill |
Fill color of the bars. Default is the qualitative palette
|
color |
Border color of the bars, can be a vector
to customize the color for each bar. Default is
|
transparency |
Transparency factor of the area of each slice from 0, no
transparency to 1, full transparency. Default is
|
fill_split |
The value of the numeric variable |
labels |
If not |
labels_color |
Color of the plotted text. Could be a vector to specify a unique color for each value. If fewer colors are specified than the number of categories, the colors are recycled. |
labels_size |
Character expansion factor, the size, of the plotted text,
for which the default value is 0.95, or 0.9 of value if |
labels_decimals |
Number of decimal digits for which to display the values.
Default is 0, round to the nearest integer for |
labels_position |
Position of the plotted text. Default is |
labels_cut |
Threshold for displaying the value. If |
xlab |
Axis label for |
ylab |
Label for |
main |
Label for the title of the graph.
Can set size with |
sub |
Sub-title of graph, below |
lab_adjust |
Two-element vector – x-axis label, y-axis label – adjusts the position of the axis labels in approximate inches. + values move the labels away from plot edge. Not applicable to Trellis graphics. |
margin_adjust |
Four-element vector – top, right, bottom and left – adjusts the margins of the plotted figure in approximate inches. + values move the corresponding margin away from plot edge. Not applicable to Trellis graphics. |
pad_y_min |
Proportion of padding added to the left side of
the |
pad_y_max |
Proportion of padding added to the right side of
the |
rotate_x |
Degrees that the axis values for the category values
axis are rotated, usually to accommodate longer values,
typically used in conjunction with |
break_x |
Replace spaces in the category values with a new line
and replace tildes with a blank so that there is no separation of words
joined by a tilde. By default, |
offset |
The amount of spacing between the axis values and the axis. Default is 0.5. Larger values such as 1.0 create space for the label when longer axis value names are rotated. |
label_max |
To improve readability of text output, the maximum size of the value labels before the labels are abbreviated for text output only. Not a literal maximum as preserving unique values may require a larger number of characters than specified. |
legend_title |
Title of the legend, which is usually set by default except when raw counts are entered as a matrix. Then a title must be specified to generate a legend. |
legend_position |
When plotting two variables, location of the legend, with the
default in the right margin. Additional options from standard R are
"topleft", "top", "topright" and others as shown in the help for the
|
legend_labels |
When plotting two variables, labels for the legend, which by
default are the levels for the second or |
legend_horiz |
By default the legend is vertical, but can be changed to horizontal. |
legend_size |
Size of legend text. |
legend_abbrev |
If specified, abbreviate legend title and legend labels to the specified number of the maximum number of characters. |
legend_adjust |
Shift legend for a two-categorical bar chart. A positive number shifts the legend to the right from its default placement. |
add |
Draw one or more objects, text, or geometric figures,
on the plot.
Possible values are any text to be written, the first argument, which is
|
x1 |
First x coordinate to be considered for each object. All coordinates vary from -1 to 1. |
y1 |
First y coordinate to be considered for each object. |
x2 |
Second x coordinate to be considered for each object.
Only used for |
y2 |
Second y coordinate to be considered for each object.
Only used for |
quiet |
If set to |
do_plot |
If |
pdf_file |
Indicate to direct pdf graphics to the specified name of the pdf file. |
width |
Width of the plot window in inches, defaults to 4.5. |
height |
Height of the plot window in inches, defaults to 4.5. |
digits_d |
Provides the number of decimal digits, set by default to at least 2 or the largest number of digits in the values of the response variable plus 1. |
out_size |
To improve the readability of the frequency distribution of a single variable displayed at the console, the maximum number of characters on a line of output at the console for one variable before the frequency distribution is written vertically. |
n_cat |
When analyzing all the variables in a data frame, specifies the largest number of unique values of variable of a numeric data type for which the variable will be analyzed as a categorical. Default is 0. [deprecated]: Better to convert a categorical integer variable to a factor. |
value_labels |
For factors, default is the factor labels, and for
character variables, default is the character values.
Or, provide labels for the |
rows |
Deprecated old parameter name that is now called |
eval_df |
Determines if to check for existing data frame and
specified variables. By default is |
... |
Other parameter values for graphics as defined
by Base R |
Details
OVERVIEW
Plot a bar chart with default colors for one or two categorical variables, that is, with a relatively small number of labels for each variable. By default, colors are selected for the bars, background and grid lines, all of which can be customized. The basic computations of the chart are provided with the standard R functions barplot
, chisq.test
and, for two variables, legend
. Horizontal bar charts, specified by horiz=TRUE
, list the value labels horizontally and automatically extend the left margin to accommodate both the value labels and the variable label.
DATA
Ultimately the bar chart is constructed from a simple summary table in which each row consists of a level of the categorical variable x
paired with the corresponding value of the numerical variable, y
, with as many rows as the number of levels of x
. Provide these values of x
and y
directly, or just provide x
for the original data of measurements to compute the counts of each category or provide x
and y
with a value of stat
to define the statistic for which to aggregate the values of y
over the levels of x
. Also can have a second categorical variable, by
.
The data may either be vectors from the global environment, the user's workspace, as illustrated in the examples below, or a variable in a data frame. The default input data frame is d
. Specify a different data frame name with the data
option. Regardless of its name, the variables in the data frame are referenced directly by their names.
If the name of the vector is in the global environment and of a variable in the input data frame has the same name, the vector from the global environment is analyzed, unless the data name frame is explicitly provided, not relying upon the default d
. If two variables are specified, both variables should be in the data frame, or one of the variables is in the data frame and the other in the global environment.
To obtain a bar chart of each categorical variable in the d
data frame, invoke BarChart()
. Or, for a data frame with a different name, insert the data frame name between the parentheses as the first listed parameter value. To analyze a subset of the variables in a data frame, specify the variable list with either a : or the c
function, such as m01:m03 or c(m01,m02,m03).
The rows
parameter subsets rows (cases) of the input data frame according to a logical expression. Use the standard R operators for logical statements as described in Logic
such as &
for and, |
for or and !
for not. Use the standard R relational operators as described in Comparison
. Examples include ==
for logical equality, !=
for not equals, and >
for greater than. See the Examples.
The form of the entered data, the first variable, categorical x
, and optionally a second variable, numerical y
, is flexible. The data may be entered as factors, numeric values, characters, or a matrix. The data may be entered and the resulting frequencies computed, or the frequencies can be entered directly. The most natural type of data to enter, when entering the variables, is to enter factors.
STATISTICAL TRANSFORMATIONS
Ultimately the bar plot is constructed from a small table of data values with each row a level of the categorical variable x
paired with the corresponding value of the numerical variable y
, with as many rows as values of x
. It is also possible to plot transformations of the values of y
for each level of categorical variable x
from a full data table with many replications of each value of x
and corresponding y
. Then reduce the larger data table down to the summary table with one of following transformations.
Transformation | Meaning |
-------------- | ------------------- |
"sum" | sum |
"mean" | mean |
"sd" | standard deviation |
"dev" | mean deviation |
"min" | minimum |
"median" | median |
"max" | maximum |
------------- | ------------------- |
The other statistical transformation is simply counting the number of occurrences of each level of x , which does not involve a value of y read from the data. Instead the value of y for each level of x is tabulated.
|
COLORS
For a one variable plot, set the default color of the bars by the current color theme according to bar_fill_discrete
argument of the function style
, which includes the default color theme "hues"
that defines a qualitative HCL color scale, or set the bar color with the fill
parameter, which references a specified vector of color specifications, such as generated by the lessR getColors
function.
Set fill
to a single color or a color palette, of which there are many possibilities. Pre-defined sequential and divergent color ranges are available as implicit calls to getColors
. Define the default qualitative color palette with "hues"
that provides HCL colors of the same chroma (saturation) and luminance (brightness). The full list of pre-defined color ranges defined in 30 degree increments around the HCL color wheel: "reds"
, "rusts"
, "browns"
,
"olives"
, "greens"
, "emeralds"
, "turquoises"
, "aquas"
, "blues"
, "purples"
,"violets"
, "magentas"
, and "grays"
.
Define a divergent color scale with value of fill
that consists of a vector of two such pre-defined ranges, such as c("purples", "rusts")
. Divergent color palettes are applicable in particular for plotting multiple bar charts on the same plot such as for a set of Likert response items, all on a common response scale. Or, manually specify colors. For example, for a two-level by
variable, could set fill
to c("coral3","seagreen3")
, where the specified colors are not pre-defined color ranges.
For the pre-defined color scales can obtain more control over the obtained color palettes with an explicit call to getColors
for the argument to fill
. Here the value of chroma (c
) and luminance (l
) can be explicitly manipulated in conjunction with the specification of a pre-defined color range. Or, create a custom color range for any value of hue (h
). See getColors
for more information.
The values of another variable can be mapped into the fill color of the bars. To do so, set fill
to the value of the variable, which would usually be the name of the y
variable if explicitly given. Or, if y
is tabulated, refer to the variable name as (count)
. The larger the count for a level of x
, the darker the bar.
Also available are the pre-specified R color palettes "rainbow"
, "terrain"
, and "heat"
. The pre-defined palette "distinct"
maximally separates colors by hue. The family of color-blind family of viridis palettes are available as "viridis"
, "cividis"
, "magma"
, "inferno"
, and "plasma"
, as well as the "Okabe-Ito"
palette. Pre-defined color palettes are available from many of Wes Anderson's movies such as "Moonrise1"
, "Royal1"
, "GrandBudapest1"
, "Darjeeling1"
and "BottleRocket1"
. Can substitute a 2
for a 1
in the preceding references, and sometimes a 3
.
LEGEND
When two variables are plotted, a legend is produced, with values for each level of the second or by
variable. By default, the location is placed in the right margin of the plot. This position can be changed with the legend_position
option, which, in addition to the lessR
option of right_margin
, accepts any valid value consistent with the standard R legend
function, used to generate the legend.
The legend title can be abbreviated with the legend_abbrev
parameter. Specify the maximum number of characters of the title. The legend is displayed vertically by default, but can be changed to horizontal with the legend_horiz
option.
LONG CATEGORY NAMES
For many plots, the names of the categories are too long. To adjust the plot for these long names, they can be rotated using the rotate_x
and rotate_y
parameters, in conjunction with offset
. The offset
parameter moves the category name out from the axis to compensate for the rotation. The changes can also be specified from style
to persist until further changes. To reset to the default after obtaining an analysis, use style()
.
Also, the following codes are used to adjust line spacing:
1. Any space in a category name is converted to a new line.
2. If the space should not be converted to a new line, then replace with a tilde, ~, which will display as a space without a line break.
For the text output at the console, can specify the maximum number of characters in a label with labels.max
. Longer value names are abbreviated to the specified length. This facilitates reading cross-tab tables. Also, a provided table pairs the abbreviated names with the actual names. For one variable frequency distributions, out_size
provides the maximum number of characters for the text output before the horizontal display of the frequency distribution is shifted to a vertical presentation.
MULTIPLE BAR CHARTS ON THE SAME PANEL (PLOT)
For multiple x-variables, set the parameter one_plot
to TRUE
to specify that each bar chart should be produced on the same panel as all other bars. This is most meaningful when all items have the same set of responses, such as a common Likert scale found in survey data. By default the one panel plot is produced when a common response scale is detected.
The algorithm to detect if the response scale is common first identifies the first variable with the largest set of responses, then checks the responses of all other variables. If all responses to all other variables are contained within the set of responses to the reference variable, then the response scales are the same. This means that on a Likert scale, for example, some items may not contain all possible responses, such as no one selects Strongly Disagree for an item. However, for the response scales to be deemed the same, at least one item (variable) must contain all possible responses.
Regardless, the one_plot
parameter can be set to either TRUE
or FALSE
regardless of the commonality of responses. Setting this parameter explicitly saves some CPU time as the algorithm to evaluate the communality of responses need not be activated.
ENTER NUMERIC VARIABLE DIRECTLY
Instead of calculating the counts from the data, the values of any numerical variable, including
the counts, can be entered directly as the y
-variable, in addition to the categorical x
-variable, and perhaps a categorical by
-variable. See the examples below.
Or, include the already tabulated counts as the data which is read into R, either as a matrix or a data frame.
STATISTICS
In addition to the bar chart, descriptive and optional inferential statistics are also presented. First, the frequency table for one variable or the joint frequency table for two variables is displayed. Second, the corresponding Cramer's V and chi-square test are also displayed by default.
VARIABLE LABELS
If variable labels exist, then the corresponding variable label is listed as the label for the horizontal axis unless xlab is specified in the function call. If there are two variables to plot, the title of the resulting plot is based on the two variable labels, unless a specific title is listed with the main
option. The variable label is also listed in the text output, next to the variable name. If the analysis is for two variables, then labels for both variables are included.
PDF OUTPUT
To obtain pdf output, use the pdf_file
option, perhaps with the optional width
and height
options. These files are written to the default working directory, which can be explicitly specified with the R setwd
function.
ONLY VARIABLES ARE REFERENCED
The referenced variable in a lessR
function can only be a variable name (or list of variable names). This referenced variable must exist in either the referenced data frame, such as the default d
, or in the user's workspace, more formally called the global environment. That is, expressions cannot be directly evaluated. For example:
> BarChart(cut(rnorm(50), breaks=seq(-5,5))) # does NOT work
Instead, do the following:
> Y <- cut(rnorm(50), breaks=seq(-5,5)) # create vector Y in user workspace > BarChart(Y) # directly reference Y
Value
The output can optionally be saved into an R
object, otherwise it only appears in the console (unless quiet
is set to TRUE
). Two different types of components are provided: the pieces of readable output, and a variety of statistics. The readable output are character strings such as tables amenable for display. The statistics are numerical values amenable for further analysis. The motivation of these types of output is to facilitate R markdown documents, as the name of each piece, preceded by the name of the saved object and a $
, can be inserted into the R~Markdown document (see examples
), interspersed with explanation and interpretation.
Tabulated numerical variable y
——————————
READABLE OUTPUT
out_title
: Title
out_lbl
: Variable label
out_counts
: Two-way frequency distribution
out_chi
: Chi-square test
One variable: out_miss
: Number of missing values
Two variables: out_prop
: Cell proportions
Two variables: out_row
: Cell proportions within each row
Two variables: out_col
: Cell proportions within each col
STATISTICS
n_dim
: Number of dimensions, 1 or 2
p_value
: p-value for null of equal proportions or independence
freq
: Data frame of the frequency distribution
One variable: freq
: Frequency distribution
One variable: values
: y-values read directly
One variable: prop
: Frequency distribution of proportions
One variable: n_miss
: Number of missing values
Numerical variable y read from data
———————————–
out_y
: Values of y
n_dim
: Number of dimensions, 1 or 2
Author(s)
David W. Gerbing (Portland State University; gerbing@pdx.edu)
References
Gerbing, D. W. (2023). R Data Analysis without Programming: Explanation and Interpretation, 2nd edition, Chapter 4, NY: Routledge.
Gerbing, D. W. (2020). R Visualizations: Derive Meaning from Data, Chapter 3, NY: CRC Press.
Gerbing, D. W. (2021). Enhancement of the Command-Line Environment for use in the Introductory Statistics Course and Beyond, Journal of Statistics and Data Science Education, 29(3), 251-266, https://www.tandfonline.com/doi/abs/10.1080/26939169.2021.1999871.
See Also
getColors
, barplot
, table
, legend
.
Examples
# get the data
d <- rd("Employee")
# --------------------------------------------------------
# bar chart from tabulating the data for a single variable
# --------------------------------------------------------
# for each level of Dept, display the frequencies
BarChart(Dept)
# short name
# bc(Dept)
# save the values output by BarChart into the myOutput list
myOutput <- BarChart(Dept)
# display the saved output
myOutput
# just males with salaries larger than 75,000 USD
BarChart(Dept, rows=(Gender=="M" & Salary > 85000))
# rotate and offset the axis labels, sort categories by frequencies
BarChart(Dept, rotate_x=45, offset=1, sort="-")
# set bars to a single color of blue with some transparency
BarChart(Dept, fill="blue", transparency=0.3)
# progressive (sequential) color scale of blues
BarChart(Dept, fill="blues")
# viridis palate
BarChart(Dept, fill="viridis")
# change the theme just for this analysis, as opposed to style()
BarChart(Dept, theme="darkgreen")
# set bar color to hcl custom hues with chroma and luminance
# at the values provided by the default hcl colors from
# the getColors function, which defaults to h=240 and h=60
# for the first two colors on the qualitative scale
bc(Gender, fill=c(hcl(h=180,c=100,l=55), hcl(h=0,c=100,l=55)))
# or set to unique colors via color names
BarChart(Gender, fill=c("palegreen3","tan"))
# darken the colors with an explicit call to getColors,
# do a lower value of luminance, set to l=25
BarChart(Dept, fill=getColors(l=25), transparency=0.4)
# column proportions instead of frequencies
BarChart(Gender, stat_x="proportion")
# map value of tabulated count to bar fill
BarChart(Dept, fill=(count))
# data with many values of categorical variable Make and large labels
myd <- Read("Cars93")
# perpendicular labels
bc(Make, rotate_x=90, data=myd)
# manage size of horizontal value labels
bc(Make, horiz=TRUE, label_max=4, data=myd)
# read y variable, Salary
# display bars for values of count <= 0 in a different color
# than values above
BarChart(Dept, Salary, stat="dev", sort="+", fill_split=0)
# ----------------------------------------------------
# bar chart from tabulating the data for two variables
# ----------------------------------------------------
# at each level of Dept, show the frequencies of the Gender levels
BarChart(Dept, by=Gender)
# Trellis (facet) plot
BarChart(Dept, by1=Gender)
# at each level of Dept, show the row proportions of the Gender levels
# i.e., 100% stacked bar graph
BarChart(Dept, by=Gender, stack100=TRUE)
# at each level of Gender, show the frequencies of the Dept levels
# do not display percentages directly on the bars
BarChart(Gender, by=JobSat, fill="reds", labels="off")
# specify two fill colors for Gender
BarChart(Dept, by=Gender, fill=c("deepskyblue", "black"))
# display bars beside each other instead of stacked, Female and Male
# the levels of Dept are included within each respective bar
# plot horizontally, display the value for each bar at the
# top of each bar
BarChart(Gender, by=Dept, beside=TRUE, horiz=TRUE, labels_position="out")
# horizontal bar chart of two variables, put legend on the top
BarChart(Gender, by=Dept, horiz=TRUE, legend_position="top")
# for more info on base R graphic options, enter: help(par)
# for lessR options, enter: style(show=TRUE)
# here fill is set in the style function instead of BarChart
# along with the others
style(fill=c("coral3","seagreen3"), lab_color="wheat4", lab_cex=1.2,
panel_fill="wheat1", main_color="wheat4")
BarChart(Dept, by=Gender,
legend_position="topleft", legend_labels=c("Girls", "Boys"),
xlab="Dept Level", main="Gender for Different Dept Levels",
value_labels=c("None", "Some", "Much", "Ouch!"))
style()
# -----------------------------------------------------------------
# multiple bar charts tabulated from data across multiple variables
# -----------------------------------------------------------------
# bar charts for all non-numeric variables in the data frame called d
# and all numeric variables with a small number of values, < n_cat
# BarChart(one_plot=FALSE)
d <- rd("Mach4", quiet=TRUE)
# all on the same plot, bar charts for 20 6-pt Likert scale items
# default scale is divergent from "browns" to "blues"
BarChart(m01:m20, horiz=TRUE, labels="off", sort="+")
# custom scale with explicit call to getColors, HCL chroma at 50
clrs <- getColors("greens", "purples", c=50)
BarChart(m01:m20, horiz=TRUE, labels="off", sort="+", fill=clrs)
# custom divergent scale with pre-defined color palettes
# with implicit call to getColors
BarChart(m01:m20, horiz=TRUE, labels="off", fill=c("aquas", "rusts"))
# ----------------------------
# can enter many types of data
# ----------------------------
# generate and enter integer data
X1 <- sample(1:4, size=100, replace=TRUE)
X2 <- sample(1:4, size=100, replace=TRUE)
BarChart(X1)
BarChart(X1, by=X2)
# generate and enter type double data
X1 <- sample(c(1,2,3,4), size=100, replace=TRUE)
X2 <- sample(c(1,2,3,4), size=100, replace=TRUE)
BarChart(X1)
BarChart(X1, by=X2)
# generate and enter character string data
# that is, without first converting to a factor
Travel <- sample(c("Bike", "Bus", "Car", "Motorcycle"), size=25, replace=TRUE)
BarChart(Travel, horiz=TRUE)
# ----------------------------
# bar chart directly from data
# ----------------------------
# include a y-variable, here Salary, in the data table to read directly
d <- read.csv(text="
Dept, Salary
ACCT,51792.78
ADMN,71277.12
FINC,59010.68
MKTG,60257.13
SALE,68830.06", header=TRUE)
BarChart(Dept, Salary)
# specify two variables for a two variable bar chart
# also specify a y-variable to provide the counts directly
# when reading y values directly, must be a summary table,
# one row of data for each combination of levels with
# a numerical value of y
# use lessR pivot function to get summary table, cannot process missing data
# so set na_show_group to FALSE
d <- Read("Employee")
a <- pivot(d, mean, Salary, c(Dept,Gender), na_group_show=FALSE)
BarChart(Dept, Salary_mean, by=Gender, data=a)
# do so just with BarChart, display bars in grayscale
# How does average salary vary by gender across the various departments?
BarChart(Dept, Salary, by=Gender, stat="mean", data=d, fill="grays")
# -----------
# annotations
# -----------
d <- rd("Employee")
# Place a message in the center of the plot
# \n indicates a new line
BarChart(Dept, add="Employees by\nDepartment", x1=3, y1=10)
# Use style to change some parameter values
style(add_trans=.8, add_fill="gold", add_color="gold4", add_lwd=0.5)
# Add a rectangle around the message centered at <3,10>
BarChart(Dept, add=c("rect", "Employees by\nDepartment"),
x1=c(2,3), y1=c(11, 10), x2=4, y2=9)