Graphical Summarization of Continuous Variables Against a Response
Description
summaryRc
is a continuous version of summary.formula
with method='response'
. It uses the plsmo
function to compute the possibly stratified lowess
nonparametric regression estimates, and plots them along with the data
density, with selected quantiles of the overall distribution (over
strata) of each x
shown as arrows on top of the graph. All the
x
variables must be numeric and continuous or nearly continuous.
Usage
summaryRc(formula, data=NULL, subset=NULL,
na.action=NULL, fun = function(x) x,
na.rm = TRUE, ylab=NULL, ylim=NULL, xlim=NULL,
nloc=NULL, datadensity=NULL,
quant = c(0.05, 0.1, 0.25, 0.5, 0.75,
0.90, 0.95), quantloc=c('top','bottom'),
cex.quant=.6, srt.quant=0,
bpplot = c('none', 'top', 'top outside', 'top inside', 'bottom'),
height.bpplot=0.08,
trim=NULL, test = FALSE, vnames = c('labels', 'names'), ...)
Arguments
formula 
An R formula with additive effects. The 
data 
name or number of a data frame. Default is the current frame. 
subset 
a logical vector or integer vector of subscripts used to specify the subset of data to use in the analysis. The default is to use all observations in the data frame. 
na.action 
function for handling missing data in the input data. The default is
a function defined here called 
fun 
function for transforming 
na.rm 

ylab 

ylim 

xlim 
a list with elements named as the variable names appearing
on the 
nloc 
location for sample size. Specify 
datadensity 
see 
quant 
vector of quantiles to use for summarizing the marginal distribution
of each 
quantloc 
specify 
cex.quant 
character size for writing which quantiles are
represented. Set to 
srt.quant 
angle for text for quantile labels 
bpplot 
if not 
height.bpplot 
height in inches of the horizontal extended box plot 
trim 
The default is to plot from the 10th smallest to the 10th
largest 
test 
Set to 
vnames 
By default, plots are usually labeled with variable labels
(see the 
... 
arguments passed to 
Value
no value is returned
Author(s)
Frank Harrell
Department of Biostatistics
Vanderbilt University
fh@fharrell.com
Examples
options(digits=3)
set.seed(177)
sex < factor(sample(c("m","f"), 500, rep=TRUE))
age < rnorm(500, 50, 5)
bp < rnorm(500, 120, 7)
units(age) < 'Years'; units(bp) < 'mmHg'
label(bp) < 'Systolic Blood Pressure'
L < .5*(sex == 'm') + 0.1 * (age  50)
y < rbinom(500, 1, plogis(L))
par(mfrow=c(1,2))
summaryRc(y ~ age + bp)
# For x limits use 1st and 99th percentiles to frame extended box plots
summaryRc(y ~ age + bp, bpplot='top', datadensity=FALSE, trim=.01)
summaryRc(y ~ age + bp + stratify(sex),
label.curves=list(keys='lines'), nloc=list(x=.1, y=.05))
y2 < rbinom(500, 1, plogis(L + .5))
Y < cbind(y, y2)
summaryRc(Y ~ age + bp + stratify(sex),
label.curves=list(keys='lines'), nloc=list(x=.1, y=.05))