BarChart {lessR}R Documentation

Bar Chart for One or Two Variables

Description

Abbreviation: bc

The function plots a bar chart, one categorical variable, x, against one numeric variable, y, possibly including an optional second categorical variable, by. The bar chart is constructed from the usually relatively brief table that pairs each level of the categorical variables with the corresponding numerical value of y. Usually, this table is a summary (pivot) table calculated as a data aggregation from the original data table of measurements, such as average salary of the employees in each department.

The calculation of this foundational summary table from which the bar chart is created can occur outside of the function. Or, probably the more usual situation, the table is implicitly calculated by the function in one of two ways. Accordingly, obtain the summary table from one of three possibilities.

  1. Enter the summary table obtained from an external source directly as the value of the data parameter, indicated by specifying categorical variables x and possibly by with the numerical variable y.

  2. Have the function implicitly summarize the entire data table. If only categorical variable x, and possibly categorical variable by, are specified without a value of numerical y, the entire data table must be input as the value of data. The function then computes numeric variable y as the computed frequency of values in each category or level of the specified categorical variables.

  3. Have the function implicitly summarize the entire data table entered as the value of data by specifying a y variable. Obtain the summary table from which the bar chart is computed by summarizing (aggregating) the value of y at each level of x, and possibly by, with the chosen statistic specified by the stat parameter. The function will assess if the input data is a summary table or the entire data table. If the entire data table is entered, and the stat parameter is not entered, the value of stat defaults to the mean.

The function also displays the foundational summary table, such as frequency table for one or two variables. If a frequency table, also displayed are Cramer's V association, and the corresponding chi-square inferential analysis. For two variables, the frequencies include the joint and marginal frequencies. To activate Trellis graphics or facets, a multi-panel display, specify a by1 variable in place of by for the second categorical variable. If the provided object to analyze is a set of multiple variables, including the name of an entire data frame, then a bar chart is calculated for each non-numeric variable in the data frame.

Usage

BarChart(

        # ------------------------------------------
        # Data from which to construct the bar chart
        x=NULL, y=NULL, by=NULL, data=d, filter=NULL,

        # ------------------------------
        # Bar chart from aggregated data
        stat=c("mean", "sum", "sd", "deviation", "min", "median", "max"),
        stat_x=c("count", "proportion"),

        # --------------------------------------------------
        # Trellis (facet) plot, stratify on different panels
        by1=NULL, n_row=NULL, n_col=NULL, aspect="fill",
        
        # -------------------------------
        # Layout and ordering of the bars
        horiz=FALSE, sort=c("0", "-", "+"),
        beside=FALSE, stack100=FALSE,
        gap=NULL, scale_y=NULL, one_plot=NULL,

        # ----------------------------------------------------------------
        # Analogy of physical Marks on paper to create the bars and labels
        theme=getOption("theme"),
        fill=NULL,
        color=getOption("bar_color_discrete"),
        transparency=getOption("trans_bar_fill"),
        fill_split=NULL,

        labels=c("%", "input", "off"),
        labels_color=getOption("labels_color"),
        labels_size=getOption("labels_size"),
        labels_decimals=getOption("labels_decimals"),
        labels_position=getOption("labels_position"),
        labels_cut=NULL,

        # ------------------------------------------------------------------
        # Labels for axes, values, and legend if x and by variables, margins
        xlab=NULL, ylab=NULL, main=NULL, sub=NULL,
        lab_adjust=c(0,0), margin_adjust=c(0,0,0,0),
        pad_y_min=0, pad_y_max=0,

        rotate_x=getOption("rotate_x"), break_x=NULL,    
        offset=getOption("offset"),
        label_max=100,

        legend_title=NULL, legend_position="right_margin",
        legend_labels=NULL, legend_horiz=FALSE,
        legend_size=NULL, legend_abbrev=NULL, legend_adjust=0,

        # ----------------------------------------------------
        # Draw one or more objects, text, or geometric figures
        add=NULL, x1=NULL, y1=NULL, x2=NULL, y2=NULL,

        # --------------------------------------------------------------------
        # Output: text or chart turned off, to PDF file, number decimal digits
        quiet=getOption("quiet"), do_plot=TRUE,
        pdf_file=NULL, width=6.5, height=6, 
        digits_d=NULL, out_size=80, 

        # --------------------------------------
        # Deprecated, removed in future versions
        n_cat=getOption("n_cat"), value_labels=NULL, rows=NULL, 

        # -------------
        # Miscellaneous
        eval_df=NULL, ...)

bc(...)

Arguments

x

Categorical variable(s) to analyze. Can be a single variable, either within a data frame or as a vector in the users workspace, or multiple variables in a data frame such as designated with the c function, or an entire data frame. If not specified, then defaults to all non-numerical variables in the specified data frame, d by default.
To manage large category values, unless break_x is FALSE, any space in each category value is converted to new line for the corresponding axis label in the plot. To keep two (small) words on the same line, replace the space that separates them with a tilde, which displays as a blank for the corresponding axis label.

y

Numeric variable with a value for each level of the categorical variable with the value plotted proportional to the height of the corresponding bar. If specified for the original data table, then the corresponding stat parameter also must be set. If not specified, then its value is by default tabulated as the frequency of each category or joint category.

by

A second categorical variable to create a two-variable bar chart for each level of the numeric primary variable y on the same plot. A similar concept applies to the panels of a Trellis (facet) plot if by1 is specified.

data

Optional data frame that contains the variables of interest. Can contain data from which frequencies or other statistics for a y-variable are computed, or can be a summary table that consists of two columns: the level of a categorical variable paired with the numeric value that determines the height of the corresponding bar.

filter

A logical expression that specifies a subset of rows of the data frame to analyze.

stat

Statistical transformation of the data for the y-axis across groups defined by the categorical variable(s), the data aggregation. Applicable values: "sum", "mean", "sd", "dev" for mean deviations, "min", "median", and "max".

stat_x

When no y variable is specified, either do the default count of each group or the proportion.


by1

A categorical variable called a conditioning variable that activates Trellis graphics (facets), from the lattice package, to create a bar chart on a separate panel for each level of the variable. Contrast to the by parameter that plots on the same panel.

n_row

Optional specification for the number of rows in the layout of a multi-panel display with Trellis graphics (facets). Need not specify n_col.

n_col

Optional specification for the number of columns in the layout of a multi-panel display with Trellis graphics (facets). Need not specify n_row. If set to 1, then the strip that labels each group locates to the left of each plot instead of the top.

aspect

Lattice parameter for the aspect ratio of the panels in a Trellis plot (multi-panel display or facets), defined as height divided by width. The default value is "fill" to have the panels expand to occupy as much space as possible. Set to 1 for square panels. Set to "xy" to specify a ratio calculated to "bank" to 45 degrees, that is, with the line slope approximately 45 degrees.


horiz

Bar orientation. By default the value is FALSE so bars are vertical, unless one_plot is TRUE.

sort

Sort the categories by their frequency for one variable and by the column sums if a by variable. Not applicable to Trellis plots. By default "0" for no sort, or sort descending "-" or ascending "+", unless one_plot is TRUE, then is set to "+".

beside

For a two variable plot, set to TRUE for the levels of the first variable to be plotted as adjacent bars instead of stacked on each other.

stack100

100% stacked bar chart when a by variable is present, also activated by setting stat_x to "proportion" with a by variable.

gap

Gap between bars. Provides the value of the space option from the standard R barplot function with a default of 0.2 unless two variables are plotted and beside=TRUE, in which case the default is c(.1,1).

scale_y

If specified, a vector of three values that define the numerical values of the y-axis, the numerical axis, within the bounds of plot region: starting value, ending value, and number of intervals.

one_plot

For bar charts of multiple x-variables, indicates if a bar plot is produced for each x-variable, or all are combined into a single plot, such as for items that all share common responses such as survey data with a common Likert scale across variables. Default is if variables share a common response scale set to TRUE, otherwise FALSE.


theme

Theme for the colors for this analysis. Make persistent across analyses with style.

fill

Fill color of the bars. Default is the qualitative palette "hues" from default theme "colors", unless the categorical variable(s) is(are) ordinal where the default is the "blues" sequential gradient. For any other color theme the default is the corresponding color gradient, such as "reds" for theme "darkred". Can also specify any vector of colors to fill the bars, such as generated by getColors, or access more pre-defined gradients such as palettes that address color-blindness such as "virdis". Or set to the name of y to map the values of bar fill into the fill colors. Specified the name of y as (count) if tabulated from the data. Not applicable if fill_split is activated.

color

Border color of the bars, can be a vector to customize the color for each bar. Default is bar_color_discrete from the lessR style function.

transparency

Transparency factor of the area of each slice from 0, no transparency to 1, full transparency. Default is trans_bar_fill from the lessR style function.

fill_split

The value of the numeric variable y for which bars that correspond to values of y <= fill_split are displayed in the first fill color and other values displayed in the second fill color, or as specified by a vector of exactly two fill colors.


labels

If not "off", adds the numerical results to the plot according to the default "%" for tabulated counts and "input" for the input values for a y-variable explicitly provided, unless there are more than 15 levels or y is present and non-integer in which case the default is "off". For tabulated counts, "prop" is also available for proportions, as well as "input" to show the computed values such as counts.

labels_color

Color of the plotted text. Could be a vector to specify a unique color for each value. If fewer colors are specified than the number of categories, the colors are recycled.

labels_size

Character expansion factor, the size, of the plotted text, for which the default value is 0.95, or 0.9 of value if beside is TRUE and labels_position is "in" because bars are narrower.

labels_decimals

Number of decimal digits for which to display the values. Default is 0, round to the nearest integer for "%", 2 for "prop", and if "input" and y is entered directly, display the literal value unless > 9999, in which case set to 0.

labels_position

Position of the plotted text. Default is "in" for inside the bar, or, if "out", the label for each value is placed outside of the bar, on top.

labels_cut

Threshold for displaying the value. If labels_position equals "out", then default is 0.028 unless there is a by variable or multiple x-variables on the same plot, then default is 0.040


xlab

Axis label for x-axis. If xlab is not specified, then the label becomes the name of the corresponding variable label if it exists, or, if not, the variable name. If xy_ticks is FALSE, then no label is displayed. If no y variable is specified, then xlab is set to Index unless xlab has been specified.

ylab

Label for y-axis. If xlab is not specified, then the label becomes the name of the corresponding variable label if it exists, or, if not, the variable name. If xy_ticks is FALSE, then no label displayed.

main

Label for the title of the graph. Can set size with main_cex and color with main_color from the lessR style function.

sub

Sub-title of graph, below xlab. Not yet implemented.

lab_adjust

Two-element vector – x-axis label, y-axis label – adjusts the position of the axis labels in approximate inches. + values move the labels away from plot edge. Not applicable to Trellis graphics.

margin_adjust

Four-element vector – top, right, bottom and left – adjusts the margins of the plotted figure in approximate inches. + values move the corresponding margin away from plot edge. Not applicable to Trellis graphics.

pad_y_min

Proportion of padding added to the left side of the y-axis. Value from 0 to 1.

pad_y_max

Proportion of padding added to the right side of the y-axis. Value from 0 to 1.


rotate_x

Degrees that the axis values for the category values axis are rotated, usually to accommodate longer values, typically used in conjunction with offset. When equal 90 the value labels are perpendicular to the x-axis and a different algorithm places the labels so that offset is not needed.

break_x

Replace spaces in the category values with a new line and replace tildes with a blank so that there is no separation of words joined by a tilde. By default, TRUE for vertical bar charts with rotate_x set to 0, and FALSE otherwise.

offset

The amount of spacing between the axis values and the axis. Default is 0.5. Larger values such as 1.0 create space for the label when longer axis value names are rotated.

label_max

To improve readability of text output, the maximum size of the value labels before the labels are abbreviated for text output only. Not a literal maximum as preserving unique values may require a larger number of characters than specified.


legend_title

Title of the legend, which is usually set by default except when raw counts are entered as a matrix. Then a title must be specified to generate a legend.

legend_position

When plotting two variables, location of the legend, with the default in the right margin. Additional options from standard R are "topleft", "top", "topright" and others as shown in the help for the legend function.

legend_labels

When plotting two variables, labels for the legend, which by default are the levels for the second or by variable.

legend_horiz

By default the legend is vertical, but can be changed to horizontal.

legend_size

Size of legend text.

legend_abbrev

If specified, abbreviate legend title and legend labels to the specified number of the maximum number of characters.

legend_adjust

Shift legend for a two-categorical bar chart. A positive number shifts the legend to the right from its default placement.


add

Draw one or more objects, text, or geometric figures, on the plot. Possible values are any text to be written, the first argument, which is "text", or, to indicate a figure, "rect" (rectangle), "line", "arrow", "v_line" (vertical line), and "h_line" (horizontal line). The value "means" is short-hand for vertical and horizontal lines at the respective means. Does not apply to Trellis graphics. Customize with parameters such as add_fill and add_color from the style function.

x1

First x coordinate to be considered for each object. All coordinates vary from -1 to 1.

y1

First y coordinate to be considered for each object.

x2

Second x coordinate to be considered for each object. Only used for "rect", "line" and arrow.

y2

Second y coordinate to be considered for each object. Only used for "rect", "line" and arrow.


quiet

If set to TRUE, no text output. Can change system default with style function.

do_plot

If TRUE, the default, then generate the plot.

pdf_file

Indicate to direct pdf graphics to the specified name of the pdf file.

width

Width of the plot window in inches, defaults to 4.5.

height

Height of the plot window in inches, defaults to 4.5.

digits_d

Provides the number of decimal digits, set by default to at least 2 or the largest number of digits in the values of the response variable plus 1.

out_size

To improve the readability of the frequency distribution of a single variable displayed at the console, the maximum number of characters on a line of output at the console for one variable before the frequency distribution is written vertically.


n_cat

When analyzing all the variables in a data frame, specifies the largest number of unique values of variable of a numeric data type for which the variable will be analyzed as a categorical. Default is 0. [deprecated]: Better to convert a categorical integer variable to a factor.

value_labels

For factors, default is the factor labels, and for character variables, default is the character values. Or, provide labels for the x-axis on the graph to override these values. If the variable is a factor and value_labels is not specified (is NULL), then the value_labels are set to the factor levels with each space replaced by a new line character. If x and y-axes have the same scale, they also apply to the y-axis. Control the plotted size with axis_cex and axis_x_cex from the lessR style function. [deprecated]: Better to convert a categorical integer variable to a factor.

rows

Deprecated old parameter name that is now called filter.


eval_df

Determines if to check for existing data frame and specified variables. By default is TRUE unless the shiny package is loaded then set to FALSE so that Shiny will run. Needs to be set to FALSE if using the pipe %\>% notation.

...

Other parameter values for graphics as defined by Base R barplot, legend, and par including xlim and ylim for setting the range of the x and y-axes
cex.main for the size of the title
col.main for the color of the title
"dotted", "dotdash"
sub and col.sub for a subtitle and its color
las=3 to reorient vertical axis labels
space for one variable only

Details

OVERVIEW
Plot a bar chart with default colors for one or two categorical variables, that is, with a relatively small number of labels for each variable. By default, colors are selected for the bars, background and grid lines, all of which can be customized. The basic computations of the chart are provided with the standard R functions barplot, chisq.test and, for two variables, legend. Horizontal bar charts, specified by horiz=TRUE, list the value labels horizontally and automatically extend the left margin to accommodate both the value labels and the variable label.

DATA
Ultimately the bar chart is constructed from a simple summary table in which each row consists of a level of the categorical variable x paired with the corresponding value of the numerical variable, y, with as many rows as the number of levels of x. Provide these values of x and y directly, or just provide x for the original data of measurements to compute the counts of each category or provide x and y with a value of stat to define the statistic for which to aggregate the values of y over the levels of x. Also can have a second categorical variable, by.

The data may either be vectors from the global environment, the user's workspace, as illustrated in the examples below, or a variable in a data frame. The default input data frame is d. Specify a different data frame name with the data option. Regardless of its name, the variables in the data frame are referenced directly by their names.

If the name of the vector is in the global environment and of a variable in the input data frame has the same name, the vector from the global environment is analyzed, unless the data name frame is explicitly provided, not relying upon the default d. If two variables are specified, both variables should be in the data frame, or one of the variables is in the data frame and the other in the global environment.

To obtain a bar chart of each categorical variable in the d data frame, invoke BarChart(). Or, for a data frame with a different name, insert the data frame name between the parentheses as the first listed parameter value. To analyze a subset of the variables in a data frame, specify the variable list with either a : or the c function, such as m01:m03 or c(m01,m02,m03).

The rows parameter subsets rows (cases) of the input data frame according to a logical expression. Use the standard R operators for logical statements as described in Logic such as & for and, | for or and ! for not. Use the standard R relational operators as described in Comparison. Examples include == for logical equality, != for not equals, and > for greater than. See the Examples.

The form of the entered data, the first variable, categorical x, and optionally a second variable, numerical y, is flexible. The data may be entered as factors, numeric values, characters, or a matrix. The data may be entered and the resulting frequencies computed, or the frequencies can be entered directly. The most natural type of data to enter, when entering the variables, is to enter factors.

STATISTICAL TRANSFORMATIONS
Ultimately the bar plot is constructed from a small table of data values with each row a level of the categorical variable x paired with the corresponding value of the numerical variable y, with as many rows as values of x. It is also possible to plot transformations of the values of y for each level of categorical variable x from a full data table with many replications of each value of x and corresponding y. Then reduce the larger data table down to the summary table with one of following transformations.

Transformation Meaning
-------------- -------------------
"sum" sum
"mean" mean
"sd" standard deviation
"dev" mean deviation
"min" minimum
"median" median
"max" maximum
------------- -------------------
The other statistical transformation is simply counting the number of occurrences of each level of x, which does not involve a value of y read from the data. Instead the value of y for each level of x is tabulated.

COLORS
For a one variable plot, set the default color of the bars by the current color theme according to bar_fill_discrete argument of the function style, which includes the default color theme "hues" that defines a qualitative HCL color scale, or set the bar color with the fill parameter, which references a specified vector of color specifications, such as generated by the lessR getColors function.

Set fill to a single color or a color palette, of which there are many possibilities. Pre-defined sequential and divergent color ranges are available as implicit calls to getColors. Define the default qualitative color palette with "hues" that provides HCL colors of the same chroma (saturation) and luminance (brightness). The full list of pre-defined color ranges defined in 30 degree increments around the HCL color wheel: "reds", "rusts", "browns", "olives", "greens", "emeralds", "turquoises", "aquas", "blues", "purples","violets", "magentas", and "grays".

Define a divergent color scale with value of fill that consists of a vector of two such pre-defined ranges, such as c("purples", "rusts"). Divergent color palettes are applicable in particular for plotting multiple bar charts on the same plot such as for a set of Likert response items, all on a common response scale. Or, manually specify colors. For example, for a two-level by variable, could set fill to c("coral3","seagreen3"), where the specified colors are not pre-defined color ranges.

For the pre-defined color scales can obtain more control over the obtained color palettes with an explicit call to getColors for the argument to fill. Here the value of chroma (c) and luminance (l) can be explicitly manipulated in conjunction with the specification of a pre-defined color range. Or, create a custom color range for any value of hue (h). See getColors for more information.

The values of another variable can be mapped into the fill color of the bars. To do so, set fill to the value of the variable, which would usually be the name of the y variable if explicitly given. Or, if y is tabulated, refer to the variable name as (count). The larger the count for a level of x, the darker the bar.

Also available are the pre-specified R color palettes "rainbow", "terrain", and "heat". The pre-defined palette "distinct" maximally separates colors by hue. The family of color-blind family of viridis palettes are available as "viridis", "cividis", "magma", "inferno", and "plasma", as well as the "Okabe-Ito" palette. Pre-defined color palettes are available from many of Wes Anderson's movies such as "Moonrise1", "Royal1", "GrandBudapest1", "Darjeeling1" and "BottleRocket1". Can substitute a 2 for a 1 in the preceding references, and sometimes a 3.

LEGEND
When two variables are plotted, a legend is produced, with values for each level of the second or by variable. By default, the location is placed in the right margin of the plot. This position can be changed with the legend_position option, which, in addition to the lessR option of right_margin, accepts any valid value consistent with the standard R legend function, used to generate the legend.

The legend title can be abbreviated with the legend_abbrev parameter. Specify the maximum number of characters of the title. The legend is displayed vertically by default, but can be changed to horizontal with the legend_horiz option.

LONG CATEGORY NAMES
For many plots, the names of the categories are too long. To adjust the plot for these long names, they can be rotated using the rotate_x and rotate_y parameters, in conjunction with offset. The offset parameter moves the category name out from the axis to compensate for the rotation. The changes can also be specified from style to persist until further changes. To reset to the default after obtaining an analysis, use style().

Also, the following codes are used to adjust line spacing:
1. Any space in a category name is converted to a new line.
2. If the space should not be converted to a new line, then replace with a tilde, ~, which will display as a space without a line break.

For the text output at the console, can specify the maximum number of characters in a label with labels.max. Longer value names are abbreviated to the specified length. This facilitates reading cross-tab tables. Also, a provided table pairs the abbreviated names with the actual names. For one variable frequency distributions, out_size provides the maximum number of characters for the text output before the horizontal display of the frequency distribution is shifted to a vertical presentation.

MULTIPLE BAR CHARTS ON THE SAME PANEL (PLOT)
For multiple x-variables, set the parameter one_plot to TRUE to specify that each bar chart should be produced on the same panel as all other bars. This is most meaningful when all items have the same set of responses, such as a common Likert scale found in survey data. By default the one panel plot is produced when a common response scale is detected.

The algorithm to detect if the response scale is common first identifies the first variable with the largest set of responses, then checks the responses of all other variables. If all responses to all other variables are contained within the set of responses to the reference variable, then the response scales are the same. This means that on a Likert scale, for example, some items may not contain all possible responses, such as no one selects Strongly Disagree for an item. However, for the response scales to be deemed the same, at least one item (variable) must contain all possible responses.

Regardless, the one_plot parameter can be set to either TRUE or FALSE regardless of the commonality of responses. Setting this parameter explicitly saves some CPU time as the algorithm to evaluate the communality of responses need not be activated.

ENTER NUMERIC VARIABLE DIRECTLY
Instead of calculating the counts from the data, the values of any numerical variable, including the counts, can be entered directly as the y-variable, in addition to the categorical x-variable, and perhaps a categorical by-variable. See the examples below.

Or, include the already tabulated counts as the data which is read into R, either as a matrix or a data frame.

STATISTICS
In addition to the bar chart, descriptive and optional inferential statistics are also presented. First, the frequency table for one variable or the joint frequency table for two variables is displayed. Second, the corresponding Cramer's V and chi-square test are also displayed by default.

VARIABLE LABELS
If variable labels exist, then the corresponding variable label is listed as the label for the horizontal axis unless xlab is specified in the function call. If there are two variables to plot, the title of the resulting plot is based on the two variable labels, unless a specific title is listed with the main option. The variable label is also listed in the text output, next to the variable name. If the analysis is for two variables, then labels for both variables are included.

PDF OUTPUT
To obtain pdf output, use the pdf_file option, perhaps with the optional width and height options. These files are written to the default working directory, which can be explicitly specified with the R setwd function.

ONLY VARIABLES ARE REFERENCED
The referenced variable in a lessR function can only be a variable name (or list of variable names). This referenced variable must exist in either the referenced data frame, such as the default d, or in the user's workspace, more formally called the global environment. That is, expressions cannot be directly evaluated. For example:

> BarChart(cut(rnorm(50), breaks=seq(-5,5))) # does NOT work

Instead, do the following:

    > Y <- cut(rnorm(50), breaks=seq(-5,5))   # create vector Y in user workspace
    > BarChart(Y)     # directly reference Y

Value

The output can optionally be saved into an R object, otherwise it only appears in the console (unless quiet is set to TRUE). Two different types of components are provided: the pieces of readable output, and a variety of statistics. The readable output are character strings such as tables amenable for display. The statistics are numerical values amenable for further analysis. The motivation of these types of output is to facilitate R markdown documents, as the name of each piece, preceded by the name of the saved object and a $, can be inserted into the R~Markdown document (see examples), interspersed with explanation and interpretation.

Tabulated numerical variable y
——————————
READABLE OUTPUT
out_title: Title
out_lbl: Variable label
out_counts: Two-way frequency distribution
out_chi: Chi-square test
One variable: out_miss: Number of missing values
Two variables: out_prop: Cell proportions
Two variables: out_row: Cell proportions within each row
Two variables: out_col: Cell proportions within each col

STATISTICS
n_dim: Number of dimensions, 1 or 2
p_value: p-value for null of equal proportions or independence
freq: Data frame of the frequency distribution
One variable: freq: Frequency distribution
One variable: values: y-values read directly
One variable: prop: Frequency distribution of proportions
One variable: n_miss: Number of missing values

Numerical variable y read from data
———————————–
out_y: Values of y
n_dim: Number of dimensions, 1 or 2

Author(s)

David W. Gerbing (Portland State University; gerbing@pdx.edu)

References

Gerbing, D. W. (2023). R Data Analysis without Programming: Explanation and Interpretation, 2nd edition, Chapter 4, NY: Routledge.

Gerbing, D. W. (2020). R Visualizations: Derive Meaning from Data, Chapter 3, NY: CRC Press.

Gerbing, D. W. (2021). Enhancement of the Command-Line Environment for use in the Introductory Statistics Course and Beyond, Journal of Statistics and Data Science Education, 29(3), 251-266, https://www.tandfonline.com/doi/abs/10.1080/26939169.2021.1999871.

See Also

getColors, barplot, table, legend.

Examples


# get the data
d <- rd("Employee")

# --------------------------------------------------------
# bar chart from tabulating the data for a single variable
# --------------------------------------------------------

# for each level of Dept, display the frequencies
BarChart(Dept)
# short name
# bc(Dept)

# save the values output by BarChart into the myOutput list
myOutput <- BarChart(Dept)
# display the saved output
myOutput

# just males with salaries larger than 75,000 USD
BarChart(Dept, rows=(Gender=="M" & Salary > 85000))

# rotate and offset the axis labels, sort categories by frequencies
BarChart(Dept, rotate_x=45, offset=1, sort="-")

# set bars to a single color of blue with some transparency
BarChart(Dept, fill="blue", transparency=0.3)
# progressive (sequential) color scale of blues
BarChart(Dept, fill="blues")

# viridis palate
BarChart(Dept, fill="viridis")

# change the theme just for this analysis, as opposed to style()
BarChart(Dept, theme="darkgreen")

# set bar color to hcl custom hues with chroma and luminance
#   at the values provided by the default hcl colors from
#   the getColors function, which defaults to h=240 and h=60
#   for the first two colors on the qualitative scale
bc(Gender, fill=c(hcl(h=180,c=100,l=55), hcl(h=0,c=100,l=55)))

# or set to unique colors via color names
BarChart(Gender, fill=c("palegreen3","tan"))

# darken the colors with an explicit call to getColors,
#   do a lower value of luminance, set to l=25
BarChart(Dept, fill=getColors(l=25), transparency=0.4)

# column proportions instead of frequencies
BarChart(Gender, stat_x="proportion")

# map value of tabulated count to bar fill
BarChart(Dept, fill=(count))

# data with many values of categorical variable Make and large labels
myd <- Read("Cars93")
# perpendicular labels
bc(Make, rotate_x=90, data=myd)
# manage size of horizontal value labels
bc(Make, horiz=TRUE, label_max=4, data=myd)

# read y variable, Salary
# display bars for values of count <= 0 in a different color
#  than values above
BarChart(Dept, Salary, stat="dev", sort="+", fill_split=0)


# ----------------------------------------------------
# bar chart from tabulating the data for two variables
# ----------------------------------------------------

# at each level of Dept, show the frequencies of the Gender levels
BarChart(Dept, by=Gender)

# Trellis (facet) plot
BarChart(Dept, by1=Gender)

# at each level of Dept, show the row proportions of the Gender levels
#   i.e., 100% stacked bar graph
BarChart(Dept, by=Gender, stack100=TRUE)

# at each level of Gender, show the frequencies of the Dept levels
# do not display percentages directly on the bars
BarChart(Gender, by=JobSat, fill="reds", labels="off")

# specify two fill colors for Gender
BarChart(Dept, by=Gender, fill=c("deepskyblue", "black"))

# display bars beside each other instead of stacked, Female and Male
# the levels of Dept are included within each respective bar
# plot horizontally, display the value for each bar at the
#   top of each bar
BarChart(Gender, by=Dept, beside=TRUE, horiz=TRUE, labels_position="out")

# horizontal bar chart of two variables, put legend on the top
BarChart(Gender, by=Dept, horiz=TRUE, legend_position="top")

# for more info on base R graphic options, enter:  help(par)
# for lessR options, enter:  style(show=TRUE)
# here fill is set in the style function instead of BarChart
#   along with the others
style(fill=c("coral3","seagreen3"), lab_color="wheat4", lab_cex=1.2,
      panel_fill="wheat1", main_color="wheat4")
BarChart(Dept, by=Gender,
         legend_position="topleft", legend_labels=c("Girls", "Boys"),
         xlab="Dept Level", main="Gender for Different Dept Levels",
         value_labels=c("None", "Some", "Much", "Ouch!"))
style()


# -----------------------------------------------------------------
# multiple bar charts tabulated from data across multiple variables
# -----------------------------------------------------------------

# bar charts for all non-numeric variables in the data frame called d
#   and all numeric variables with a small number of values, < n_cat
# BarChart(one_plot=FALSE)

d <- rd("Mach4", quiet=TRUE)

# all on the same plot, bar charts for 20 6-pt Likert scale items
# default scale is divergent from "browns" to "blues"
BarChart(m01:m20, horiz=TRUE, labels="off", sort="+")




# custom scale with explicit call to getColors, HCL chroma at 50
clrs <- getColors("greens", "purples", c=50)
BarChart(m01:m20, horiz=TRUE, labels="off", sort="+", fill=clrs)

# custom divergent scale with pre-defined color palettes
#  with implicit call to getColors
BarChart(m01:m20, horiz=TRUE, labels="off", fill=c("aquas", "rusts"))


# ----------------------------
# can enter many types of data
# ----------------------------

# generate and enter integer data
X1 <- sample(1:4, size=100, replace=TRUE)
X2 <- sample(1:4, size=100, replace=TRUE)
BarChart(X1)
BarChart(X1, by=X2)

# generate and enter type double data
X1 <- sample(c(1,2,3,4), size=100, replace=TRUE)
X2 <- sample(c(1,2,3,4), size=100, replace=TRUE)
BarChart(X1)
BarChart(X1, by=X2)

# generate and enter character string data
# that is, without first converting to a factor
Travel <- sample(c("Bike", "Bus", "Car", "Motorcycle"), size=25, replace=TRUE)
BarChart(Travel, horiz=TRUE)


# ----------------------------
# bar chart directly from data
# ----------------------------

# include a y-variable, here Salary, in the data table to read directly
d <- read.csv(text="
Dept, Salary
ACCT,51792.78
ADMN,71277.12
FINC,59010.68
MKTG,60257.13
SALE,68830.06", header=TRUE)
BarChart(Dept, Salary)

# specify two variables for a two variable bar chart
# also specify a y-variable to provide the counts directly
# when reading y values directly, must be a summary table,
#   one row of data for each combination of levels with
#   a numerical value of y
# use lessR pivot function to get summary table, cannot process missing data
#   so set na_show_group to FALSE
d <- Read("Employee")
a <- pivot(d, mean, Salary, c(Dept,Gender), na_group_show=FALSE)
BarChart(Dept, Salary_mean, by=Gender, data=a)
# do so just with BarChart, display bars in grayscale
# How does average salary vary by gender across the various departments?
BarChart(Dept, Salary, by=Gender, stat="mean", data=d, fill="grays")


# -----------
# annotations
# -----------

d <- rd("Employee")

# Place a message in the center of the plot
# \n indicates a new line
BarChart(Dept, add="Employees by\nDepartment", x1=3, y1=10)

# Use style to change some parameter values
style(add_trans=.8, add_fill="gold", add_color="gold4", add_lwd=0.5)
# Add a rectangle around the message centered at <3,10>
BarChart(Dept, add=c("rect", "Employees by\nDepartment"),
                     x1=c(2,3), y1=c(11, 10), x2=4, y2=9)


[Package lessR version 4.3.6 Index]