setupDiagnosticPlot {robustHD} | R Documentation |
Set up a diagnostic plot for a sequence of regression models
Description
Extract the fitted values and residuals of a sequence of regression models (such as robust least angle regression models or sparse least trimmed squares regression models) and other useful information for diagnostic plots.
Usage
setupDiagnosticPlot(object, ...)
## S3 method for class 'seqModel'
setupDiagnosticPlot(object, s = NA, covArgs = list(...), ...)
## S3 method for class 'perrySeqModel'
setupDiagnosticPlot(object, ...)
## S3 method for class 'tslars'
setupDiagnosticPlot(object, p, ...)
## S3 method for class 'sparseLTS'
setupDiagnosticPlot(
object,
s = NA,
fit = c("reweighted", "raw", "both"),
covArgs = list(...),
...
)
## S3 method for class 'perrySparseLTS'
setupDiagnosticPlot(object, ...)
Arguments
object |
the model fit from which to extract information. |
... |
additional arguments to be passed to
|
s |
for the |
covArgs |
a list of arguments to be passed to
|
p |
an integer giving the lag length for which to extract information (the default is to use the optimal lag length). |
fit |
a character string specifying from which fit to extract
information. Possible values are |
Details
Note that the argument alpha
for controlling the subset size
behaves differently for sparseLTS
than for
covMcd
. For sparseLTS
, the subset
size h
is determined by the fraction alpha
of the number of
observations n
. For covMcd
, on the other
hand, the subset size also depends on the number of variables p
(see
h.alpha.n
). However, the "sparseLTS"
and
"perrySparseLTS"
methods attempt to compute the MCD using the same
subset size that is used to compute the sparse least trimmed squares
regressions. This may not be possible if the number of selected variables
is large compared to the number of observations, in which case a warning is
given and NA
s are returned for the robust Mahalanobis distances.
Value
An object of class "setupDiagnosticPlot"
with the following
components:
data
a data frame containing the columns listed below.
step
the steps (for the
"seqModel"
method) or indices (for the"sparseLTS"
method) of the models (only returned if more than one model is requested).fit
the model fits (only returned if both the reweighted and raw fit are requested in the
"sparseLTS"
method).index
the indices of the observations.
fitted
the fitted values.
residual
the standardized residuals.
theoretical
the corresponding theoretical quantiles from the standard normal distribution.
qqd
the absolute distances from a reference line through the first and third sample and theoretical quartiles.
rd
the robust Mahalanobis distances computed via the minimum covariance determinant (MCD) estimator (see
covMcd
).xyd
the pairwise maxima of the absolute values of the standardized residuals and the robust Mahalanobis distances, divided by the respective other outlier detection cutoff point.
weight
the weights indicating regression outliers.
leverage
logicals indicating leverage points (i.e., outliers in the predictor space).
Diagnostics
a factor with levels
"Potential outlier"
(potential regression outliers) and"Regular observation"
(data points following the model).
qqLine
a data frame containing the intercepts and slopes of the respective reference lines to be displayed in residual Q-Q plots.
q
a data frame containing the quantiles of the Mahalanobis distribution used as cutoff points for detecting leverage points.
facets
default faceting formula for the diagnostic plots (only returned where applicable).
Author(s)
Andreas Alfons
See Also
diagnosticPlot
, rlars
,
grplars
, rgrplars
, tslarsP
,
rtslarsP
, tslars
, rtslars
,
sparseLTS
Examples
## generate data
# example is not high-dimensional to keep computation time low
library("mvtnorm")
set.seed(1234) # for reproducibility
n <- 100 # number of observations
p <- 25 # number of variables
beta <- rep.int(c(1, 0), c(5, p-5)) # coefficients
sigma <- 0.5 # controls signal-to-noise ratio
epsilon <- 0.1 # contamination level
Sigma <- 0.5^t(sapply(1:p, function(i, j) abs(i-j), 1:p))
x <- rmvnorm(n, sigma=Sigma) # predictor matrix
e <- rnorm(n) # error terms
i <- 1:ceiling(epsilon*n) # observations to be contaminated
e[i] <- e[i] + 5 # vertical outliers
y <- c(x %*% beta + sigma * e) # response
x[i,] <- x[i,] + 5 # bad leverage points
## robust LARS
# fit model
fitRlars <- rlars(x, y, sMax = 10)
# extract information for plotting
setup <- setupDiagnosticPlot(fitRlars)
diagnosticPlot(setup)
## sparse LTS
# fit model
fitSparseLTS <- sparseLTS(x, y, lambda = 0.05, mode = "fraction")
# extract information for plotting
setup1 <- setupDiagnosticPlot(fitSparseLTS)
diagnosticPlot(setup1)
setup2 <- setupDiagnosticPlot(fitSparseLTS, fit = "both")
diagnosticPlot(setup2)