scatterplot {car}  R Documentation 
This function uses basic R graphics to draw a twodimensional scatterplot, with options to allow for plot enhancements that are often helpful with regression problems. Enhancements include adding marginal boxplots, estimated mean and variance functions using either parametric or nonparametric methods, point identification, jittering, setting characteristics of points and lines like color, size and symbol, marking points and fitting lines conditional on a grouping variable, and other enhancements.
sp
is an abbreviation for scatterplot
.
scatterplot(x, ...) ## S3 method for class 'formula' scatterplot(formula, data, subset, xlab, ylab, id=FALSE, legend=TRUE, ...) ## Default S3 method: scatterplot(x, y, boxplots=if (by.groups) "" else "xy", regLine=TRUE, legend=TRUE, id=FALSE, ellipse=FALSE, grid=TRUE, smooth=TRUE, groups, by.groups=!missing(groups), xlab=deparse(substitute(x)), ylab=deparse(substitute(y)), log="", jitter=list(), cex=par("cex"), col=carPalette()[1], pch=1:n.groups, reset.par=TRUE, ...) sp(x, ...)
x 
vector of horizontal coordinates (or first argument of generic function). 
y 
vector of vertical coordinates. 
formula 
a model formula, of the form 
data 
data frame within which to evaluate the formula. 
subset 
expression defining a subset of observations. 
boxplots 
if 
regLine 
controls adding a fitted regression line to the plot. if

legend 
when the plot is drawn by groups and 
id 
controls point identification; if 
ellipse 
controls plotting dataconcentration ellipses. If 
grid 
If TRUE, the default, a lightgray background grid is put on the graph 
smooth 
specifies a nonparametric estimate of the mean or median
function of the vertical axis variable given the
horizontal axis variable and optionally a nonparametric estimate of the conditional variance. If

groups 
a factor or other variable dividing the data into groups; groups are plotted with different colors, plotting characters, fits, and smooths. Using this argument is equivalent to specifying the grouping variable in the formula. 
by.groups 
if 
xlab 
label for horizontal axis. 
ylab 
label for vertical axis. 
log 
same as the 
jitter 
a list with elements 
col 
with no grouping, this specifies a color for plotted points;
with grouping, this argument should be a vector
of colors of length at least equal to the number of groups. The default is
value returned by 
pch 
plotting characters for points; default is the plotting characters in
order (see 
cex 
sets the size of plotting characters, with 
reset.par 
if 
... 
other arguments passed down and to 
Many arguments to scatterplot
were changed in version 3 of car to simplify use of this function.
The smooth
argument is used to control adding smooth curves to the plot to estimate the conditional center of the vertical axis variable given the horizontal axis variable, and also the conditional variability. Setting smooth=FALSE
omits all smoothers, while smooth=TRUE
, the default, includes default smoothers. Alternatively smooth
can be set to a list of subarguments that provide finer control over the smoothing.
The default behavior of smooth=TRUE
is equivalent to smooth=list(smoother=loessLine, var=!by.groups, lty.var=2, lty.var=4, style="filled", alpha=0.15, border=TRUE, vertical=TRUE)
, specifying the default loessLine
smoother for the conditional mean smooth and variance smooth. The color of the smooths is the same of the color of the points, but this can be changed with the arguments col.smooth
and col.var
.
Additional available smoothers are gamLine
which uses the gam
function and quantregLine
which uses quantile regression to estimate the median and quartile functions using rqss
. All of these smoothers have one or more arguments described on their help pages, and these arguments can be added to the smooth
argument; for example, smooth = list(span=1/2)
would use the default loessLine
smoother, include the variance smooth, and change the value of the smoothing parameter to 1/2.
For loessLine
and gamLine
the variance smooth is estimated by separately smoothing the squared positive and negative residuals from the mean smooth, using the same type of smoother. The displayed curves are equal to the mean smooth plus the square root of the fit to the positive squared residuals, and the mean fit minus the square root of the smooth of the negative squared residuals. The lines therefore represent the comnditional variabiliity at each value on the horizontal axis. Because smoothing is done separately for positive and negative residuals, the variation shown will generally not be symmetric about the fitted mean function. For the quantregLine
method, the center estimates the conditional median, and the variability estimates the lower and upper quartiles of the estimated conditional distribution.
The default depection of the variance functions is via a shaded envelope between the upper and lower estimate of variability. setting the subarguement style="lines"
will display only the boundaries of this region, and style="none"
suppresses variance smoothing.
For style="filled"
several subarguments modify the appearance of the region: codealpha is a number between 0 and 1 that specifies opacity with defualt 0.15, border
, TRUE
or FALSE
specifies a border for the envelope, and vertical
either TRUE
or FALSE
, modifies the behavior of the envelope at the edges of the graph.
The subarguments spread
, lty.spread
and col.spread
of the smooth
argument are equivalent to the newer var
, col.var
and lty.var
, respectively, recognizing that the spread is a measuure of conditional variability.
If points are identified, their labels are returned; otherwise NULL
is returned invisibly.
John Fox jfox@mcmaster.ca
Fox, J. and Weisberg, S. (2019) An R Companion to Applied Regression, Third Edition, Sage.
boxplot
,
jitter
, legend
,
scatterplotMatrix
, dataEllipse
, Boxplot
,
cov.trob
,
showLabels
, ScatterplotSmoothers
.
scatterplot(prestige ~ income, data=Prestige, ellipse=TRUE, smooth=list(style="lines")) scatterplot(prestige ~ income, data=Prestige, smooth=list(smoother=quantregLine)) scatterplot(prestige ~ income, data=Prestige, smooth=list(smoother=quantregLine, border="FALSE")) # use quantile regression for median and quartile fits scatterplot(prestige ~ income  type, data=Prestige, smooth=list(smoother=quantregLine, var=TRUE, span=1, lwd=4, lwd.var=2)) scatterplot(prestige ~ income  type, data=Prestige, legend=list(coords="topleft")) scatterplot(vocabulary ~ education, jitter=list(x=1, y=1), data=Vocab, smooth=FALSE, lwd=3) scatterplot(infantMortality ~ ppgdp, log="xy", data=UN, id=list(n=5)) scatterplot(income ~ type, data=Prestige) ## Not run: # remember to exit from pointidentification mode scatterplot(infantMortality ~ ppgdp, id=list(method="identify"), data=UN) ## End(Not run)