R: Cellwise Robust M-regression

crm {crmReg}

R Documentation

Cellwise Robust M-regression

Description

Fits a cellwise robust M-regression estimator. Besides a vector of regression coefficients, the function returns an imputed data set that contains estimates of what the values in cellwise outliers would need to amount to if they had fit the model.

Usage

crm(formula, data, maxiter = 100, tolerance = 0.01, outlyingness.factor = 1,
    spadieta = seq(0.9, 0.1, -0.1), center = "median", scale = "qn",
    regtype = "MM", alphaLTS = NULL, seed = NULL, verbose = TRUE)

Arguments

`formula`	an lm-style formula object specifying which relationship to estimate.
`data`	the data as a data frame.
`maxiter`	maximum number of iterations (default is `100`).
`tolerance`	obtain optimal regression coefficients to within a certain tolerance (default is `0.01`).
`outlyingness.factor`	numeric value, larger or equal to 1 (default). Only cells are altered of cases for which the original outlyingness (before SPADIMO) is larger than outlyingness.factor * outlyingness AFTER SPADIMO. The larger this factor, the fewer cells are imputed.
`spadieta`	the sparsity parameter to start internal outlying cell detection with, must be in the range [0,1] (default is `seq(0.9, 0.1, -0.1)`).
`center`	how to center the data. A string that matches the R function to be used for centering (default is `"median"`).
`scale`	how to scale the data. Choices are "no" (no scaling) or a string matching the R function to be used for scaling (default is `"qn"`).
`regtype`	type of robust regression. Choices are `"MM"` (default) or `"LTS"`.
`alphaLTS`	parameter used by LTS regression. The percentage (roughly) of squared residuals whose sum will be minimized (default is `0.5`).
`seed`	initial seed for random generator, like .Random.seed (default is `NULL`).
`verbose`	should output be shown during the process (default is `TRUE`).

Details

The cellwise robust M-regression (CRM) estimator (Filzmoser et al., 2020) is a linear regression estimator that intrinsically yields both a map of cellwise outliers consistent with the linear model, and a vector of regression coefficients that is robust against vertical outliers and leverage points. As a by-product, the method yields a weighted and imputed data set that contains estimates of what the values in cellwise outliers would need to amount to if they had fit the model. The CRM method consists of an iteratively reweighted least squares procedure where SPADIMO is applied at each iteration to detect the cells that contribute most to outlyingness. As such, CRM detects deviating data cells consistent with a linear model.

Value

crm returns a list object of class "crm" containing the following elements:

`coefficients`	a named vector of fitted coefficients.
`fitted.values`	the fitted response values.
`residuals`	the residuals, that is response minus fitted values.
`weights`	the (case) weights of the residuals.
`data.imputed`	the data as imputed by CRM.
`casewiseoutliers`	a vector that indicates the casewise outliers with `TRUE` or `FALSE`.
`cellwiseoutliers`	a matrix that indicates the cellwise outliers as the (scaled) difference between the original data and imputed data, both scaled and centered.
`terms`	the terms object used.
`call`	the matched call.
`inputs`	the list of supplied input arguments.
`numloops`	the number of iterations.
`time`	the number of seconds passed to execute the CRM algorithm.

Author(s)

Peter Filzmoser, Sebastiaan Hoppner, Irene Ortner, Sven Serneels, and Tim Verdonck

References

Filzmoser, P., Hoppner, S., Ortner, I., Serneels, S., and Verdonck, T. (2020). Cellwise Robust M regression. Computational Statistics and Data Analysis, 147, 106944. DOI:10.1016/j.csda.2020.106944

Examples

library(crmReg)
data(topgear)

# fit Cellwise Robust M-regression:
crmfit <- crm(formula = MPG ~ ., data = topgear)

# estimated regression coefficients and detected casewise outliers:
print(crmfit$coefficients)
print(rownames(topgear)[which(crmfit$casewiseoutliers)])

# fitted response values (MPG) versus true response values:
plot(topgear$MPG, crmfit$fitted.values, xlab = "True MPG", ylab = "Fitted MPG")
abline(a = 0, b = 1)

# residuals:
plot(crmfit$residuals, ylab = "Residuals")
text(x = which(crmfit$residuals > 30), y = crmfit$residuals[which(crmfit$residuals > 30)],
     labels = rownames(topgear)[which(crmfit$residuals > 30)], pos = 2)

print(cbind.data.frame(car = rownames(topgear),
                       MPG = topgear$MPG)[which(crmfit$residuals > 30), ])

# cellwise heatmap of casewise outliers:
cellwiseheatmap(cellwiseoutliers = crmfit$cellwiseoutliers[which(crmfit$casewiseoutliers), ],
                data = round(topgear[which(crmfit$casewiseoutliers), -7], 2),
                col.scale.factor = 1/4)
# check the plotted heatmap!

[Package crmReg version 1.0.2 Index]