R: Set up a diagnostic plot for a sequence of regression models

setupDiagnosticPlot {robustHD}

R Documentation

Set up a diagnostic plot for a sequence of regression models

Description

Extract the fitted values and residuals of a sequence of regression models (such as robust least angle regression models or sparse least trimmed squares regression models) and other useful information for diagnostic plots.

Usage

setupDiagnosticPlot(object, ...)

## S3 method for class 'seqModel'
setupDiagnosticPlot(object, s = NA, covArgs = list(...), ...)

## S3 method for class 'perrySeqModel'
setupDiagnosticPlot(object, ...)

## S3 method for class 'tslars'
setupDiagnosticPlot(object, p, ...)

## S3 method for class 'sparseLTS'
setupDiagnosticPlot(
  object,
  s = NA,
  fit = c("reweighted", "raw", "both"),
  covArgs = list(...),
  ...
)

## S3 method for class 'perrySparseLTS'
setupDiagnosticPlot(object, ...)

Arguments

`object`	the model fit from which to extract information.
`...`	additional arguments to be passed to `covMcd` can be specified directly instead of via `covArgs`.
`s`	for the `"seqModel"` method, an integer vector giving the steps of the submodels from which to extract information (the default is to use the optimal submodel). For the `"sparseLTS"` method, an integer vector giving the indices of the models from which to extract information (the default is to use the optimal model for each of the requested fits).
`covArgs`	a list of arguments to be passed to `covMcd` for computing robust Mahalanobis distances.
`p`	an integer giving the lag length for which to extract information (the default is to use the optimal lag length).
`fit`	a character string specifying from which fit to extract information. Possible values are `"reweighted"` (the default) to convert the reweighted fit, `"raw"` to convert the raw fit, or `"both"` to convert both fits.

Details

Note that the argument alpha for controlling the subset size behaves differently for sparseLTS than for covMcd. For sparseLTS, the subset size h is determined by the fraction alpha of the number of observations n. For covMcd, on the other hand, the subset size also depends on the number of variables p (see h.alpha.n). However, the "sparseLTS" and "perrySparseLTS" methods attempt to compute the MCD using the same subset size that is used to compute the sparse least trimmed squares regressions. This may not be possible if the number of selected variables is large compared to the number of observations, in which case a warning is given and NAs are returned for the robust Mahalanobis distances.

Value

An object of class "setupDiagnosticPlot" with the following components:

data

a data frame containing the columns listed below.

step: the steps (for the "seqModel" method) or indices (for the "sparseLTS" method) of the models (only returned if more than one model is requested).
fit: the model fits (only returned if both the reweighted and raw fit are requested in the "sparseLTS" method).
index: the indices of the observations.
fitted: the fitted values.
residual: the standardized residuals.
theoretical: the corresponding theoretical quantiles from the standard normal distribution.
qqd: the absolute distances from a reference line through the first and third sample and theoretical quartiles.
rd: the robust Mahalanobis distances computed via the minimum covariance determinant (MCD) estimator (see covMcd).
xyd: the pairwise maxima of the absolute values of the standardized residuals and the robust Mahalanobis distances, divided by the respective other outlier detection cutoff point.
weight: the weights indicating regression outliers.
leverage: logicals indicating leverage points (i.e., outliers in the predictor space).
Diagnostics: a factor with levels "Potential outlier" (potential regression outliers) and "Regular observation" (data points following the model).

qqLine

a data frame containing the intercepts and slopes of the respective reference lines to be displayed in residual Q-Q plots.

q

a data frame containing the quantiles of the Mahalanobis distribution used as cutoff points for detecting leverage points.

facets

default faceting formula for the diagnostic plots (only returned where applicable).

Author(s)

Andreas Alfons

Examples

## generate data
# example is not high-dimensional to keep computation time low
library("mvtnorm")
set.seed(1234)  # for reproducibility
n <- 100  # number of observations
p <- 25   # number of variables
beta <- rep.int(c(1, 0), c(5, p-5))  # coefficients
sigma <- 0.5      # controls signal-to-noise ratio
epsilon <- 0.1    # contamination level
Sigma <- 0.5^t(sapply(1:p, function(i, j) abs(i-j), 1:p))
x <- rmvnorm(n, sigma=Sigma)    # predictor matrix
e <- rnorm(n)                   # error terms
i <- 1:ceiling(epsilon*n)       # observations to be contaminated
e[i] <- e[i] + 5                # vertical outliers
y <- c(x %*% beta + sigma * e)  # response
x[i,] <- x[i,] + 5              # bad leverage points


## robust LARS
# fit model
fitRlars <- rlars(x, y, sMax = 10)
# extract information for plotting
setup <- setupDiagnosticPlot(fitRlars)
diagnosticPlot(setup)


## sparse LTS
# fit model
fitSparseLTS <- sparseLTS(x, y, lambda = 0.05, mode = "fraction")
# extract information for plotting
setup1 <- setupDiagnosticPlot(fitSparseLTS)
diagnosticPlot(setup1)
setup2 <- setupDiagnosticPlot(fitSparseLTS, fit = "both")
diagnosticPlot(setup2)

[Package robustHD version 0.8.1 Index]