summaryS {Hmisc}  R Documentation 
Summarize Multiple Response Variables and Make Multipanel Scatter or Dot Plot
Description
Multiple lefthand formula variables along with righthand side
conditioning variables are reshaped into a "tall and thin" data frame if
fun
is not specified. The resulting raw data can be plotted with
the plot
method using userspecified panel
functions for
lattice
graphics, typically to make a scatterplot or loess
smooths, or both. The Hmisc
panel.plsmo
function is handy
in this context. Instead, if fun
is specified, this function
takes individual response variables (which may be matrices, as in
Surv
objects) and creates one or more summary
statistics that will be computed while the resulting data frame is being
collapsed to one row per condition. The plot
method in this case
plots a multipanel dot chart using the lattice
dotplot
function if panel
is not specified
to plot
. There is an option to print
selected statistics as text on the panels. summaryS
pays special
attention to Hmisc
variable annotations: label, units
.
When panel
is specified in addition to fun
, a special
xy
plot is made that assumes that the x
axis variable
(typically time) is discrete. This is used for example to plot multiple
quantile intervals as vertical lines next to the main point. A special
panel function mvarclPanel
is provided for this purpose.
The plotp
method produces corresponding plotly
graphics.
When fun
is given and panel
is omitted, and the result of
fun
is a vector of more than one
statistic, the first statistic is taken as the main one. Any columns
with names not in textonly
will figure into the calculation of
axis limits. Those in textonly
will be printed right under the
dot lines in the dot chart. Statistics with names in textplot
will figure into limits, be plotted, and printed. pch.stats
can
be used to specify symbols for statistics after the first column. When
fun
computed three columns that are plotted, columns two and
three are taken as confidence limits for which horizontal "error bars"
are drawn. Two levels with different thicknesses are drawn if there are
four plotted summary statistics beyond the first.
mbarclPanel
is used to draw multiple vertical lines around the
main points, such as a series of quantile intervals stratified by
x
and paneling variables. If mbarclPanel
finds a column
of an arument yother
that is named "se"
, and if there are
exactly two levels to a superpositioning variable, the halfheight of
the approximate 0.95 confidence interval for the difference between two
point estimates is shown, positioned at the midpoint of the two point
estimates at an x
value. This assume normality of point
estimates, and the standard error of the difference is the square root
of the sum of squares of the two standard errors. By positioning the
intervals in this fashion, a failure of the two point estimates to touch
the halfconfidence interval is consistent with rejecting the null
hypothesis of no difference at the 0.05 level.
mbarclpl
is the sfun
function corresponding to
mbarclPanel
for plotp
, and medvpl
is the
sfun
replacement for medvPanel
.
medvPanel
takes raw data and plots median y
vs. x
,
along with confidence intervals and halfinterval for the difference in
medians as with mbarclPanel
. Quantile intervals are optional.
Very transparent vertical violin plots are added by default. Unlike
panel.violin
, only half of the violin is plotted, and when there
are two superpose groups they are sidebyside in different colors.
For plotp
, the function corresponding to medvPanel
is
medvpl
, which draws backtoback spike histograms, optional Gini
mean difference, optional SD, quantiles (thin line version of box
plot with 0.05 0.25 0.5 0.75 0.95 quantiles), and halfwidth confidence
interval for differences in medians. For quantiles, the HarrellDavis
estimator is used.
Usage
summaryS(formula, fun = NULL, data = NULL, subset = NULL,
na.action = na.retain, continuous=10, ...)
## S3 method for class 'summaryS'
plot(x, formula=NULL, groups=NULL, panel=NULL,
paneldoesgroups=FALSE, datadensity=NULL, ylab='',
funlabel=NULL, textonly='n', textplot=NULL,
digits=3, custom=NULL,
xlim=NULL, ylim=NULL, cex.strip=1, cex.values=0.5, pch.stats=NULL,
key=list(columns=length(groupslevels),
x=.75, y=.04, cex=.9,
col=lattice::trellis.par.get('superpose.symbol')$col,
corner=c(0,1)),
outerlabels=TRUE, autoarrange=TRUE, scat1d.opts=NULL, ...)
## S3 method for class 'summaryS'
plotp(data, formula=NULL, groups=NULL, sfun=NULL,
fitter=NULL, showpts=! length(fitter), funlabel=NULL,
digits=5, xlim=NULL, ylim=NULL,
shareX=TRUE, shareY=FALSE, autoarrange=TRUE, ...)
mbarclPanel(x, y, subscripts, groups=NULL, yother, ...)
medvPanel(x, y, subscripts, groups=NULL, violin=TRUE, quantiles=FALSE, ...)
mbarclpl(x, y, groups=NULL, yother, yvar=NULL, maintracename='y',
xlim=NULL, ylim=NULL, xname='x', alphaSegments=0.45, ...)
medvpl(x, y, groups=NULL, yvar=NULL, maintracename='y',
xlim=NULL, ylim=NULL, xlab=xname, ylab=NULL, xname='x',
zeroline=FALSE, yother=NULL, alphaSegments=0.45,
dhistboxp.opts=NULL, ...)
Arguments
formula 
a formula with possibly multiple left and rightside
variables separated by 
fun 
an optional summarization function, e.g., 
data 
optional input data frame. For 
subset 
optional subsetting criteria 
na.action 
function for dealing with 
continuous 
minimum number of unique values for a numeric variable to have to be considered continuous 
... 
ignored for 
x 
an object created by 
groups 
a character string or factor specifying that one of the conditioning variables is used for superpositioning and not paneling 
panel 
optional 
paneldoesgroups 
set to 
datadensity 
set to 
ylab 
optional 
funlabel 
optional axis label for when 
textonly 
names of statistics to print and not plot. By
default, any statistic named 
textplot 
names of statistics to print and plot 
digits 
used if any statistics are printed as text (including

custom 
a function that customizes formatting of statistics that are printed as text. This is useful for generating plotmath notation. See the example in the tests directory. 
xlim 
optional 
ylim 
optional 
cex.strip 
size of strip labels 
cex.values 
size of statistics printed as text 
pch.stats 
symbols to use for statistics (not included the one
one in columne one) that are plotted. This is a named
vectors, with names exactly matching those created by

key 

outerlabels 
set to 
autoarrange 
set to 
scat1d.opts 
a list of options to specify to 
y , subscripts 
provided by 
yother 
passed to the panel function from the 
violin 
controls whether violin plots are included 
quantiles 
controls whether quantile intervals are included 
sfun 
a function called by 
fitter 
a fitting function such as 
showpts 
set to 
shareX 

shareY 

yvar 
a character or factor variable used to stratify the analysis into multiple yvariables 
maintracename 
a default trace name when it can't be inferred 
xname 
xaxis variable name for hover text when it can't be inferred 
xlab 
xaxis label when it can't be inferred 
alphaSegments 
alpha saturation to draw line segments for

dhistboxp.opts 

zeroline 
set to 
Value
a data frame with added attributes for summaryS
or a
lattice
object ready to render for plot
Author(s)
Frank Harrell
See Also
Examples
# See tests directory file summaryS.r for more examples, and summarySp.r
# for plotp examples
require(survival)
n < 100
set.seed(1)
d < data.frame(sbp=rnorm(n, 120, 10),
dbp=rnorm(n, 80, 10),
age=rnorm(n, 50, 10),
days=sample(1:n, n, TRUE),
S1=Surv(2*runif(n)), S2=Surv(runif(n)),
race=sample(c('Asian', 'Black/AA', 'White'), n, TRUE),
sex=sample(c('Female', 'Male'), n, TRUE),
treat=sample(c('A', 'B'), n, TRUE),
region=sample(c('North America','Europe'), n, TRUE),
meda=sample(0:1, n, TRUE), medb=sample(0:1, n, TRUE))
d < upData(d, labels=c(sbp='Systolic BP', dbp='Diastolic BP',
race='Race', sex='Sex', treat='Treatment',
days='Time Since Randomization',
S1='Hospitalization', S2='ReOperation',
meda='Medication A', medb='Medication B'),
units=c(sbp='mmHg', dbp='mmHg', age='Year', days='Days'))
s < summaryS(age + sbp + dbp ~ days + region + treat, data=d)
# plot(s) # 3 pages
plot(s, groups='treat', datadensity=TRUE,
scat1d.opts=list(lwd=.5, nhistSpike=0))
plot(s, groups='treat', panel=lattice::panel.loess,
key=list(space='bottom', columns=2),
datadensity=TRUE, scat1d.opts=list(lwd=.5))
# To make a plotly graph when the stratification variable region is not
# present, run the following (showpts adds raw data points):
# plotp(s, groups='treat', fitter=loess, showpts=TRUE)
# Make your own plot using data frame created by summaryP
# xyplot(y ~ days  yvar * region, groups=treat, data=s,
# scales=list(y='free', rot=0))
# Use loess to estimate the probability of two different types of events as
# a function of time
s < summaryS(meda + medb ~ days + treat + region, data=d)
pan < function(...)
panel.plsmo(..., type='l', label.curves=max(which.packet()) == 1,
datadensity=TRUE)
plot(s, groups='treat', panel=pan, paneldoesgroups=TRUE,
scat1d.opts=list(lwd=.7), cex.strip=.8)
# Repeat using intervals instead of nonparametric smoother
pan < function(...) # really need mobs > 96 to est. proportion
panel.plsmo(..., type='l', label.curves=max(which.packet()) == 1,
method='intervals', mobs=5)
plot(s, groups='treat', panel=pan, paneldoesgroups=TRUE, xlim=c(0, 150))
# Demonstrate dot charts of summary statistics
s < summaryS(age + sbp + dbp ~ region + treat, data=d, fun=mean)
plot(s)
plot(s, groups='treat', funlabel=expression(bar(X)))
# Compute parametric confidence limits for mean, and include sample
# sizes by naming a column "n"
f < function(x) {
x < x[! is.na(x)]
c(smean.cl.normal(x, na.rm=FALSE), n=length(x))
}
s < summaryS(age + sbp + dbp ~ region + treat, data=d, fun=f)
plot(s, funlabel=expression(bar(X) %+% t[0.975] %*% s))
plot(s, groups='treat', cex.values=.65,
key=list(space='bottom', columns=2,
text=c('Treatment A:','Treatment B:')))
# For discrete time, plot HarrellDavis quantiles of y variables across
# time using different line characteristics to distinguish quantiles
d < upData(d, days=round(days / 30) * 30)
g < function(y) {
probs < c(0.05, 0.125, 0.25, 0.375)
probs < sort(c(probs, 1  probs))
y < y[! is.na(y)]
w < hdquantile(y, probs)
m < hdquantile(y, 0.5, se=TRUE)
se < as.numeric(attr(m, 'se'))
c(Median=as.numeric(m), w, se=se, n=length(y))
}
s < summaryS(sbp + dbp ~ days + region, fun=g, data=d)
plot(s, panel=mbarclPanel)
plot(s, groups='region', panel=mbarclPanel, paneldoesgroups=TRUE)
# For discrete time, plot median y vs x along with CL for difference,
# using HarrellDavis median estimator and its s.e., and use violin
# plots
s < summaryS(sbp + dbp ~ days + region, data=d)
plot(s, groups='region', panel=medvPanel, paneldoesgroups=TRUE)
# Proportions and Wilson confidence limits, plus approx. Gaussian
# based half/width confidence limits for difference in probabilities
g < function(y) {
y < y[!is.na(y)]
n < length(y)
p < mean(y)
se < sqrt(p * (1.  p) / n)
structure(c(binconf(sum(y), n), se=se, n=n),
names=c('Proportion', 'Lower', 'Upper', 'se', 'n'))
}
s < summaryS(meda + medb ~ days + region, fun=g, data=d)
plot(s, groups='region', panel=mbarclPanel, paneldoesgroups=TRUE)