diagnosticPlot {robustHD} | R Documentation |
Diagnostic plots for a sequence of regression models
Description
Produce diagnostic plots for a sequence of regression models, such as submodels along a robust least angle regression sequence, or sparse least trimmed squares regression models for a grid of values for the penalty parameter. Four plots are currently implemented.
Usage
diagnosticPlot(object, ...)
## S3 method for class 'seqModel'
diagnosticPlot(object, s = NA, covArgs = list(), ...)
## S3 method for class 'perrySeqModel'
diagnosticPlot(object, covArgs = list(), ...)
## S3 method for class 'tslars'
diagnosticPlot(object, p, s = NA, covArgs = list(), ...)
## S3 method for class 'sparseLTS'
diagnosticPlot(
object,
s = NA,
fit = c("reweighted", "raw", "both"),
covArgs = list(),
...
)
## S3 method for class 'perrySparseLTS'
diagnosticPlot(
object,
fit = c("reweighted", "raw", "both"),
covArgs = list(),
...
)
## S3 method for class 'setupDiagnosticPlot'
diagnosticPlot(
object,
which = c("all", "rqq", "rindex", "rfit", "rdiag"),
ask = (which == "all"),
facets = object$facets,
size = c(2, 4),
id.n = NULL,
...
)
Arguments
object |
the model fit for which to produce diagnostic plots, or an
object containing all necessary information for plotting (as generated
by |
... |
additional arguments to be passed down, eventually to
|
s |
for the |
covArgs |
a list of arguments to be passed to
|
p |
an integer giving the lag length for which to produce the plot (the default is to use the optimal lag length). |
fit |
a character string specifying for which fit to produce
diagnostic plots. Possible values are |
which |
a character string indicating which plot to show. Possible
values are |
ask |
a logical indicating whether the user should be asked before
each plot (see |
facets |
a faceting formula to override the default behavior. If
supplied, |
size |
a numeric vector of length two giving the point and label size, respectively. |
id.n |
an integer giving the number of the most extreme observations to be identified by a label. The default is to use the number of identified outliers, which can be different for the different plots. See “Details” for more information. |
Details
In the normal Q-Q plot of the standardized residuals, a reference line is
drawn through the first and third quartile. The id.n
observations
with the largest distances from that line are identified by a label (the
observation number). The default for id.n
is the number of
regression outliers, i.e., the number of observations whose residuals are
too large (cf. weights
).
In the plots of the standardized residuals versus their index or the fitted
values, horizontal reference lines are drawn at 0 and +/-2.5. The
id.n
observations with the largest absolute values of the
standardized residuals are identified by a label (the observation
number). The default for id.n
is the number of regression outliers,
i.e., the number of observations whose absolute residuals are too large (cf.
weights
).
For the regression diagnostic plot, the robust Mahalanobis distances of the
predictor variables are computed via the minimum covariance determinant
(MCD) estimator based on only those predictors with non-zero coefficients
(see covMcd
). Horizontal reference lines are
drawn at +/-2.5 and a vertical reference line is drawn at the upper 97.5%
quantile of the \chi^{2}
distribution with p
degrees of freedom, where p
denotes the number of predictors with
non-zero coefficients. The id.n
observations with the largest
absolute values of the standardized residuals and/or largest robust
Mahalanobis distances are identified by a label (the observation number).
The default for id.n
is the number of all outliers: regression
outliers (i.e., observations whose absolute residuals are too large, cf.
weights
) and leverage points (i.e.,
observations with robust Mahalanobis distance larger than the 97.5%
quantile of the \chi^{2}
distribution with p
degrees of freedom).
Note that the argument alpha
for controlling the subset size
behaves differently for sparseLTS
than for
covMcd
. For sparseLTS
, the subset
size h
is determined by the fraction alpha
of the number of
observations n
. For covMcd
, on the other
hand, the subset size also depends on the number of variables p
(see
h.alpha.n
). However, the "sparseLTS"
and
"perrySparseLTS"
methods attempt to compute the MCD using the same
subset size that is used to compute the sparse least trimmed squares
regressions. This may not be possible if the number of selected variables
is large compared to the number of observations. In such cases,
setupDiagnosticPlot
returns NA
s for the robust
Mahalanobis distances, and the regression diagnostic plot fails.
Value
If only one plot is requested, an object of class "ggplot"
(see
ggplot
), otherwise a list of such objects.
Author(s)
Andreas Alfons
See Also
ggplot
, rlars
,
grplars
, rgrplars
, tslarsP
,
rtslarsP
, tslars
, rtslars
,
sparseLTS
, plot.lts
Examples
## generate data
# example is not high-dimensional to keep computation time low
library("mvtnorm")
set.seed(1234) # for reproducibility
n <- 100 # number of observations
p <- 25 # number of variables
beta <- rep.int(c(1, 0), c(5, p-5)) # coefficients
sigma <- 0.5 # controls signal-to-noise ratio
epsilon <- 0.1 # contamination level
Sigma <- 0.5^t(sapply(1:p, function(i, j) abs(i-j), 1:p))
x <- rmvnorm(n, sigma=Sigma) # predictor matrix
e <- rnorm(n) # error terms
i <- 1:ceiling(epsilon*n) # observations to be contaminated
e[i] <- e[i] + 5 # vertical outliers
y <- c(x %*% beta + sigma * e) # response
x[i,] <- x[i,] + 5 # bad leverage points
## robust LARS
# fit model
fitRlars <- rlars(x, y, sMax = 10)
# create plot
diagnosticPlot(fitRlars)
## sparse LTS
# fit model
fitSparseLTS <- sparseLTS(x, y, lambda = 0.05, mode = "fraction")
# create plot
diagnosticPlot(fitSparseLTS)
diagnosticPlot(fitSparseLTS, fit = "both")