predict.lm {stats}  R Documentation 
Predicted values based on linear model object.
## S3 method for class 'lm'
predict(object, newdata, se.fit = FALSE, scale = NULL, df = Inf,
interval = c("none", "confidence", "prediction"),
level = 0.95, type = c("response", "terms"),
terms = NULL, na.action = na.pass,
pred.var = res.var/weights, weights = 1,
rankdeficient = c("warnif", "simple", "nonestim", "NA", "NAwarn"),
tol = 1e6, verbose = FALSE,
...)
object 
Object of class inheriting from 
newdata 
An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. 
se.fit 
A switch indicating if standard errors are required. 
scale 
Scale parameter for std.err. calculation. 
df 
Degrees of freedom for scale. 
interval 
Type of interval calculation. Can be abbreviated. 
level 
Tolerance/confidence level. 
type 
Type of prediction (response or model term). Can be abbreviated. 
terms 
If 
na.action 
function determining what should be done with missing
values in 
pred.var 
the variance(s) for future observations to be assumed for prediction intervals. See ‘Details’. 
weights 
variance weights for prediction. This can be a numeric
vector or a onesided model formula. In the latter case, it is
interpreted as an expression evaluated in 
rankdeficient 
a

tol 
nonnegative number determining how nonestimability is determined in rank deficient cases. 
verbose 

... 
further arguments passed to or from other methods. 
predict.lm
produces predicted values, obtained by evaluating
the regression function in the frame newdata
(which defaults to
model.frame(object)
). If the logical se.fit
is
TRUE
, standard errors of the predictions are calculated. If
the numeric argument scale
is set (with optional df
), it
is used as the residual standard deviation in the computation of the
standard errors, otherwise this is extracted from the model fit.
Setting intervals
specifies computation of confidence or
prediction (tolerance) intervals at the specified level
, sometimes
referred to as narrow vs. wide intervals.
If the fit is rankdeficient, some of the columns of the design matrix
will have been dropped during the lm
computations, and
corresponding coef()
components set to NA
.
Prediction from such a fit only makes sense if newdata
is
contained in the same subspace as the original data. Other
newdata
entries (rows) are nonestimable
. This is now
checked (up to numerical tolerance tol
) unless rankdeficient
== "simple"
, which corresponds to previous behaviour, warns always and
predicts using the nonNA
coefficients with the corresponding
columns of the design matrix. The new default option,
rankdeficient == "warnif"
checks if there are
“nonestimable” cases (up to tolerance tol
) and only warns
in that case. All further rankdeficient
options also check and
either predict NA
or mark the nonestimable cases differently.
If newdata
is omitted the predictions are based on the data
used for the fit. In that case how cases with missing values in the
original fit are handled is determined by the na.action
argument of that
fit. If na.action = na.omit
omitted cases will not appear in
the predictions, whereas if na.action = na.exclude
they will
appear (in predictions, standard errors or interval limits),
with value NA
. See also napredict
.
The prediction intervals are for a single observation at each case in
newdata
(or by default, the data used for the fit) with error
variance(s) pred.var
. This can be a multiple of res.var
,
the estimated value of \sigma^2
: the default is to assume that
future observations have the same error variance as those
used for fitting. If weights
is supplied, the inverse of this
is used as a scale factor. For a weighted fit, if the prediction
is for the original data frame, weights
defaults to the weights
used for the model fit, with a warning since it might not be the
intended result. If the fit was weighted and newdata
is given, the
default is to assume constant prediction variance, with a warning.
predict.lm
produces a vector of predictions or a matrix of
predictions and bounds with column names fit
, lwr
, and
upr
if interval
is set. For type = "terms"
this
is a matrix with a column per term and may have an attribute
"constant"
.
If se.fit
is
TRUE
, a list with the following components is returned:
fit 
vector or matrix as above 
se.fit 
standard error of predicted means 
residual.scale 
residual standard deviations 
df 
degrees of freedom for residual 
Variables are first looked for in newdata
and then searched for
in the usual way (which will include the environment of the formula
used in the fit). A warning will be given if the
variables found are not of the same length as those in newdata
if it was supplied.
Notice that prediction variances and prediction intervals always refer to future observations, possibly corresponding to the same predictors as used for the fit. The variance of the residuals will be smaller.
Strictly speaking, the formula used for prediction limits assumes that
the degrees of freedom for the fit are the same as those for the
residual variance. This may not be the case if res.var
is
not obtained from the fit.
The model fitting function lm
, predict
.
SafePrediction for prediction from (univariable) polynomial and spline fits.
require(graphics)
## Predictions
x < rnorm(15)
y < x + rnorm(15)
predict(lm(y ~ x))
new < data.frame(x = seq(3, 3, 0.5))
predict(lm(y ~ x), new, se.fit = TRUE)
pred.w.plim < predict(lm(y ~ x), new, interval = "prediction")
pred.w.clim < predict(lm(y ~ x), new, interval = "confidence")
matplot(new$x, cbind(pred.w.clim, pred.w.plim[,1]),
lty = c(1,2,2,3,3), type = "l", ylab = "predicted y")
## Prediction intervals, special cases
## The first three of these throw warnings
w < 1 + x^2
fit < lm(y ~ x)
wfit < lm(y ~ x, weights = w)
predict(fit, interval = "prediction")
predict(wfit, interval = "prediction")
predict(wfit, new, interval = "prediction")
predict(wfit, new, interval = "prediction", weights = (new$x)^2)
predict(wfit, new, interval = "prediction", weights = ~x^2)
## From aov(.) example  predict(.. terms)
npk.aov < aov(yield ~ block + N*P*K, npk)
(termL < attr(terms(npk.aov), "term.labels"))
(pt < predict(npk.aov, type = "terms"))
pt. < predict(npk.aov, type = "terms", terms = termL[1:4])
stopifnot(all.equal(pt[,1:4], pt.,
tolerance = 1e12, check.attributes = FALSE))