anova.rms {rms}  R Documentation 
The anova
function automatically tests most meaningful hypotheses
in a design. For example, suppose that age and cholesterol are
predictors, and that a general interaction is modeled using a restricted
spline surface. anova
prints Wald statistics (F statistics
for an ols
fit) for testing linearity of age, linearity of
cholesterol, age effect (age + age by cholesterol interaction),
cholesterol effect (cholesterol + age by cholesterol interaction),
linearity of the age by cholesterol interaction (i.e., adequacy of the
simple age * cholesterol 1 d.f. product), linearity of the interaction
in age alone, and linearity of the interaction in cholesterol
alone. Joint tests of all interaction terms in the model and all
nonlinear terms in the model are also performed. For any multiple
d.f. effects for continuous variables that were not modeled through
rcs
, pol
, lsp
, etc., tests of linearity will be
omitted. This applies to matrix predictors produced by e.g.
poly
or ns
. print.anova.rms
is the printing
method. plot.anova.rms
draws dot charts depicting the importance
of variables in the model, as measured by Wald chisquare,
chisquare minus d.f., AIC, Pvalues, partial
R^2, R^2 for the whole model after deleting the effects in
question, or proportion of overall model R^2 that is due to each
predictor. latex.anova.rms
is the latex
method. It
substitutes Greek/math symbols in column headings, uses boldface for
TOTAL
lines, and constructs a caption. Then it passes the result
to latex.default
for conversion to LaTeX.
For Bayesian models such as blrm
, anova
computes relative
explained variation indexes (REV) based on approximate Wald statistics.
This uses the variancecovariance matrix of all of the posterior draws,
and the individual draws of betas, plus an overall summary from the
posterior mode/mean/median beta. Wald chisquares assuming multivariate
normality of betas are computed just as with frequentist models, and for
each draw (or for the summary) the ratio of the partial Wald chisquare
to the total Wald statistic for the model is computed as REV.
The print
method calls latex
or html
methods
depending on options(prType=)
, and output is to the console. For
latex
a table
environment is not used and an ordinary
tabular
is produced.
html.anova.rms
just calls latex.anova.rms
.
## S3 method for class 'rms' anova(object, ..., main.effect=FALSE, tol=1e9, test=c('F','Chisq'), india=TRUE, indnl=TRUE, ss=TRUE, vnames=c('names','labels'), posterior.summary=c('mean', 'median', 'mode'), ns=500, cint=0.95) ## S3 method for class 'anova.rms' print(x, which=c('none','subscripts','names','dots'), table.env=FALSE, ...) ## S3 method for class 'anova.rms' plot(x, what=c("chisqminusdf","chisq","aic","P","partial R2","remaining R2", "proportion R2", "proportion chisq"), xlab=NULL, pch=16, rm.totals=TRUE, rm.ia=FALSE, rm.other=NULL, newnames, sort=c("descending","ascending","none"), margin=c('chisq','P'), pl=TRUE, trans=NULL, ntrans=40, height=NULL, width=NULL, ...) ## S3 method for class 'anova.rms' latex(object, title, dec.chisq=2, dec.F=2, dec.ss=NA, dec.ms=NA, dec.P=4, dec.REV=3, table.env=TRUE, caption=NULL, ...) ## S3 method for class 'anova.rms' html(object, ...)
object 
a 
... 
If omitted, all variables are tested, yielding tests for individual factors
and for pooled effects. Specify a subset of the variables to obtain tests
for only those factors, with a pooled Wald tests for the combined effects
of all factors listed. Names may be abbreviated. For example, specify
Can be optional graphical parameters to send to
For 
main.effect 
Set to 
tol 
singularity criterion for use in matrix inversion 
test 
For an 
india 
set to 
indnl 
set to 
ss 
For an 
vnames 
set to 
posterior.summary 
specifies whether the posterior mode/mean/median beta are to be used as a measure of central tendence of the posterior distribution, for use in relative explained variation from Bayesian models 
ns 
number of random samples from the posterior draws to use for REV highest posterior density intervals 
cint 
HPD interval probability 
x 
for 
which 
If 
what 
what type of statistic to plot. The default is the Wald
chisquare
statistic for each factor (adding in the effect of higherordered
factors containing that factor) minus its degrees of freedom. The
R2 choices for 
xlab 
xaxis label, default is constructed according to 
pch 
character for plotting dots in dot charts. Default is 16 (solid dot). 
rm.totals 
set to 
rm.ia 
set to 
rm.other 
a list of other predictor names to omit from the chart 
newnames 
a list of substitute predictor names to use, after omitting any. 
sort 
default is to sort bars in descending order of the summary statistic 
margin 
set to a vector of character strings to write text for
selected statistics in the right margin of the dot chart. The
character strings can be any combination of 
pl 
set to 
trans 
set to a function to apply that transformation to the statistics
being plotted, and to truncate negative values at zero. A good choice
is 
ntrans 

height,width 
height and width of 
title 
title to pass to 
dec.chisq 
number of places to the right of the decimal place for typesetting
chisquare values (default is 
dec.F 
digits to the right for F statistics (default is 
dec.ss 
digits to the right for sums of squares (default is 
dec.ms 
digits to the right for mean squares (default is 
dec.P 
digits to the right for Pvalues 
dec.REV 
digits to the right for REV 
table.env 
see 
caption 
caption for table if 
If the statistics being plotted with plot.anova.rms
are few in
number and one of them is negative or zero, plot.anova.rms
will quit because of an error in dotchart2
.
The latex
method requires LaTeX packages relsize
and
needspace
.
anova.rms
returns a matrix of class anova.rms
containing factors
as rows and chisquare, d.f., and Pvalues as
columns (or d.f., partial SS, MS, F, P). An attribute
vinfo
provides list of variables involved in each row and the
type of test done.
plot.anova.rms
invisibly returns the vector of quantities
plotted. This vector has a names attribute describing the terms for
which the statistics in the vector are calculated.
print
prints, latex
creates a
file with a name of the form "title.tex"
(see the title
argument above).
Frank Harrell
Department of Biostatistics, Vanderbilt University
fh@fharrell.com
rms
, rmsMisc
, lrtest
,
rms.trans
, summary.rms
, plot.Predict
,
ggplot.Predict
, solvet
,
locator
,
dotchart2
, latex
,
xYplot
, anova.lm
,
contrast.rms
, pantext
n < 1000 # define sample size set.seed(17) # so can reproduce the results treat < factor(sample(c('a','b','c'), n,TRUE)) num.diseases < sample(0:4, n,TRUE) age < rnorm(n, 50, 10) cholesterol < rnorm(n, 200, 25) weight < rnorm(n, 150, 20) sex < factor(sample(c('female','male'), n,TRUE)) label(age) < 'Age' # label is in Hmisc label(num.diseases) < 'Number of Comorbid Diseases' label(cholesterol) < 'Total Cholesterol' label(weight) < 'Weight, lbs.' label(sex) < 'Sex' units(cholesterol) < 'mg/dl' # uses units.default in Hmisc # Specify population model for log odds that Y=1 L < .1*(num.diseases2) + .045*(age50) + (log(cholesterol  10)5.2)*(2*(treat=='a') + 3.5*(treat=='b')+2*(treat=='c')) # Simulate binary y to have Prob(y=1) = 1/[1+exp(L)] y < ifelse(runif(n) < plogis(L), 1, 0) fit < lrm(y ~ treat + scored(num.diseases) + rcs(age) + log(cholesterol+10) + treat:log(cholesterol+10)) a < anova(fit) # Test all factors b < anova(fit, treat, cholesterol) # Test these 2 by themselves # to get their pooled effects a b # Add a new line to the plot with combined effects s < rbind(a, 'treat+cholesterol'=b['TOTAL',]) class(s) < 'anova.rms' plot(s, margin=c('chisq', 'proportion chisq')) g < lrm(y ~ treat*rcs(age)) dd < datadist(treat, num.diseases, age, cholesterol) options(datadist='dd') p < Predict(g, age, treat="b") s < anova(g) # Usually omit fontfamily to default to 'Courier' # It's specified here to make R pass its packagebuilding checks plot(p, addpanel=pantext(s, 28, 1.9, fontfamily='Helvetica')) plot(s, margin=c('chisq', 'proportion chisq')) # new plot  dot chart of chisqd.f. with 2 other stats in right margin # latex(s) # nice printout  creates anova.g.tex options(datadist=NULL) # Simulate data with from a given model, and display exactly which # hypotheses are being tested set.seed(123) age < rnorm(500, 50, 15) treat < factor(sample(c('a','b','c'), 500, TRUE)) bp < rnorm(500, 120, 10) y < ifelse(treat=='a', (age50)*.05, abs(age50)*.08) + 3*(treat=='c') + pmax(bp, 100)*.09 + rnorm(500) f < ols(y ~ treat*lsp(age,50) + rcs(bp,4)) print(names(coef(f)), quote=FALSE) specs(f) anova(f) an < anova(f) options(digits=3) print(an, 'subscripts') print(an, 'dots') an < anova(f, test='Chisq', ss=FALSE) plot(0:1) # make some plot tab < pantext(an, 1.2, .6, lattice=FALSE, fontfamily='Helvetica') # create function to write table; usually omit fontfamily tab() # execute it; could do tab(cex=.65) plot(an) # new plot  dot chart of chisqd.f. # Specify plot(an, trans=sqrt) to use a square root scale for this plot # latex(an) # nice printout  creates anova.f.tex ## Example to save partial R^2 for all predictors, along with overall ## R^2, from two separate fits, and to combine them with a lattice plot require(lattice) set.seed(1) n < 100 x1 < runif(n) x2 < runif(n) y < (x1.5)^2 + x2 + runif(n) group < c(rep('a', n/2), rep('b', n/2)) A < NULL for(g in c('a','b')) { f < ols(y ~ pol(x1,2) + pol(x2,2) + pol(x1,2) %ia% pol(x2,2), subset=group==g) a < plot(anova(f), what='partial R2', pl=FALSE, rm.totals=FALSE, sort='none') a < a[grep('NONLINEAR', names(a))] d < data.frame(group=g, Variable=factor(names(a), names(a)), partialR2=unname(a)) A < rbind(A, d) } dotplot(Variable ~ partialR2  group, data=A, xlab=ex < expression(partial~R^2)) dotplot(group ~ partialR2  Variable, data=A, xlab=ex) dotplot(Variable ~ partialR2, groups=group, data=A, xlab=ex, auto.key=list(corner=c(.5,.5))) # Suppose that a researcher wants to make a big deal about a variable # because it has the highest adjusted chisquare. We use the # bootstrap to derive 0.95 confidence intervals for the ranks of all # the effects in the model. We use the plot method for anova, with # pl=FALSE to suppress actual plotting of chisquare  d.f. for each # bootstrap repetition. # It is important to tell plot.anova.rms not to sort the results, or # every bootstrap replication would have ranks of 1,2,3,... for the stats. n < 300 set.seed(1) d < data.frame(x1=runif(n), x2=runif(n), x3=runif(n), x4=runif(n), x5=runif(n), x6=runif(n), x7=runif(n), x8=runif(n), x9=runif(n), x10=runif(n), x11=runif(n), x12=runif(n)) d$y < with(d, 1*x1 + 2*x2 + 3*x3 + 4*x4 + 5*x5 + 6*x6 + 7*x7 + 8*x8 + 9*x9 + 10*x10 + 11*x11 + 12*x12 + 9*rnorm(n)) f < ols(y ~ x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12, data=d) B < 20 # actually use B=1000 ranks < matrix(NA, nrow=B, ncol=12) rankvars < function(fit) rank(plot(anova(fit), sort='none', pl=FALSE)) Rank < rankvars(f) for(i in 1:B) { j < sample(1:n, n, TRUE) bootfit < update(f, data=d, subset=j) ranks[i,] < rankvars(bootfit) } lim < t(apply(ranks, 2, quantile, probs=c(.025,.975))) predictor < factor(names(Rank), names(Rank)) Dotplot(predictor ~ Cbind(Rank, lim), pch=3, xlab='Rank')