wBACON_reg {wbacon}R Documentation

Robust Fitting Linear Regression Models by the BACON Algorithm

Description

The weighted BACON algorithm is a robust method to fit weighted linear regression models. The method is robust against outlier in the response variable and the design matrix (leverage observation).

Usage

wBACON_reg(formula, weights = NULL, data, collect = 4, na.rm = FALSE,
    alpha = 0.05, version = c("V2", "V1"), maxiter = 50, verbose = FALSE,
    original = FALSE, n_threads = 2)

## S3 method for class 'wbaconlm'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'wbaconlm'
summary(object, ...)
## S3 method for class 'wbaconlm'
fitted(object, ...)
## S3 method for class 'wbaconlm'
residuals(object, ...)
## S3 method for class 'wbaconlm'
coef(object, ...)
## S3 method for class 'wbaconlm'
vcov(object, ...)

Arguments

formula

an object of class formula: a symbolic description of the model to be fitted.

weights

[numeric] sampling weight (default weights = NULL).

data

a data.frame object.

collect

determines the size m of the initial subset to be m = collect \cdot p, where p is the number of variables, [integer].

na.rm

[logical] indicating whether NA values should be removed before the computation proceeds (default: FALSE).

alpha

[numeric] tuning constant, level of significance, 0 < \alpha < 1; (default: alpha = 0.05).

version

method to initialize the basic subset, [character]: Version "V1" of Billor et al. (2000) yields affine equivariant but not robust estimators; Version "V1" yields estimators that are robust but not affine equivariant; (default: V2).

maxiter

[integer] maximal number of iterations (default: maxiter = 50).

verbose

[logical] indicating whether additional information is printed to the console (default: TRUE).

original

[logical] if original = TRUE the subset of the m = collect \cdot p smallest observations (small w.r.t. to the Mahalanobis distances) is taken from the subset generated by Algorithm 3 as the basic subset for regression [this is the original method of Billor et al. (2000)]; otherwise (i.e., when original = FALSE) the subset that results from Algorithm 3 of Billor et al. (2000) is taken to be the basic subset for regression (default original = FALSE).

n_threads

[integer] number of threads used for OpenMP (default: 2).

digits

[integer] minimal number of significant digits.

object

object of class wbaconlm.

x

object of class wbaconlm.

...

additional arguments passed to the method.

Details

First, the wBACON method is applied to the model's design matrix (having removed the regression intercept/constant, if there is a constant) to establish a subset of observations which is supposed to be free of outliers. Second, the so generated subset is regressed onto the corresponding subset of response variables. The subset is iteratively enlarged to include as many “good” observations as possible.

The original approach of Billor et al. (2000) obtains by specifying the argument original = TRUE.

Models for wBACON_reg are specified symbolically. A typical model has the form response ~ terms, where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response.

A formula has an implied intercept term. To remove this use either y ~ x - 1 or y ~ 0 + x. See formula or lm for for more details.

The weights argument can be used to specify sampling weights or case weights.

It is not possible to fit multiple response variables (on the r.h.s. of the formula, i.e. multivariate models) in one call.

The method cannot deal with missing values. If the argument na.rm is set to TRUE the method behaves like na.omit.

Assumptions

The algorithm assumes that the non-outlying data follow a linear (homoscedastic) regression model and that the independent variables have (roughly) an elliptically contoured distribution. “Although the algorithms will often do something reasonable even when these assumptions are violated, it is hard to say what the results mean.” (Billor et al., 2000, p. 289)

In line with Billor et al. (2000, p. 290), we use the term outlier “nomination” rather than “detection” to highlight that algorithms should not go beyond nominating observations as potential outliers. It is left to the analyst to finally label outlying observations as such.

Utility functions and tools

The generic functions coef, fitted, residuals, and vcov extract the estimate coefficients, fitted values, residuals, and the covariance matrix of the estimated coefficients.

The function summary summarizes the estimated model.

Value

An object of class wbaconlm with slots

coefficients

a named vector of coefficients

residuals

the residuals (for all observations in the data.frame not only the ones in the final subset

rank

the numeric rank of the fitted linear model (i.e.. number of variables in the design matrix

fitted.values

fitted values

df.residual

the residual degrees of freedom (computed for the observations in the final subset)

call

the matched call

terms

the terms object

model

the model.frame used

weights

weights

qr

the qr object of the linear model fit for the final subset

subset

the subset

reg

a list with additional details on wBACON_reg

mv

a list with details on the results of wBACON that have been used to initialize wBACON_reg

References

Billor N., Hadi A.S. and Vellemann P.F. (2000). BACON: Blocked Adaptive Computationally efficient Outlier Nominators. Computational Statistics and Data Analysis 34, pp. 279–298. doi:10.1016/S0167-9473(99)00101-2

Schoch, T. (2021). wbacon: Weighted BACON algorithms for multivariate outlier nomination (detection) and robust linear regression, Journal of Open Source Software 6 (62), 3238 doi:10.21105/joss.03238

See Also

plot gives diagnostic plots for an wbaconlm object.

predict is used for prediction (incl. confidence and prediction intervals).

Examples

data(iris)
m <- wBACON_reg(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width,
    data = iris)
m
summary(m)

[Package wbacon version 0.6-1 Index]