crm {crmReg}R Documentation

Cellwise Robust M-regression

Description

Fits a cellwise robust M-regression estimator. Besides a vector of regression coefficients, the function returns an imputed data set that contains estimates of what the values in cellwise outliers would need to amount to if they had fit the model.

Usage

crm(formula, data, maxiter = 100, tolerance = 0.01, outlyingness.factor = 1,
    spadieta = seq(0.9, 0.1, -0.1), center = "median", scale = "qn",
    regtype = "MM", alphaLTS = NULL, seed = NULL, verbose = TRUE)

Arguments

formula

an lm-style formula object specifying which relationship to estimate.

data

the data as a data frame.

maxiter

maximum number of iterations (default is 100).

tolerance

obtain optimal regression coefficients to within a certain tolerance (default is 0.01).

outlyingness.factor

numeric value, larger or equal to 1 (default). Only cells are altered of cases for which the original outlyingness (before SPADIMO) is larger than outlyingness.factor * outlyingness AFTER SPADIMO. The larger this factor, the fewer cells are imputed.

spadieta

the sparsity parameter to start internal outlying cell detection with, must be in the range [0,1] (default is seq(0.9, 0.1, -0.1)).

center

how to center the data. A string that matches the R function to be used for centering (default is "median").

scale

how to scale the data. Choices are "no" (no scaling) or a string matching the R function to be used for scaling (default is "qn").

regtype

type of robust regression. Choices are "MM" (default) or "LTS".

alphaLTS

parameter used by LTS regression. The percentage (roughly) of squared residuals whose sum will be minimized (default is 0.5).

seed

initial seed for random generator, like .Random.seed (default is NULL).

verbose

should output be shown during the process (default is TRUE).

Details

The cellwise robust M-regression (CRM) estimator (Filzmoser et al., 2020) is a linear regression estimator that intrinsically yields both a map of cellwise outliers consistent with the linear model, and a vector of regression coefficients that is robust against vertical outliers and leverage points. As a by-product, the method yields a weighted and imputed data set that contains estimates of what the values in cellwise outliers would need to amount to if they had fit the model. The CRM method consists of an iteratively reweighted least squares procedure where SPADIMO is applied at each iteration to detect the cells that contribute most to outlyingness. As such, CRM detects deviating data cells consistent with a linear model.

Value

crm returns a list object of class "crm" containing the following elements:

coefficients

a named vector of fitted coefficients.

fitted.values

the fitted response values.

residuals

the residuals, that is response minus fitted values.

weights

the (case) weights of the residuals.

data.imputed

the data as imputed by CRM.

casewiseoutliers

a vector that indicates the casewise outliers with TRUE or FALSE.

cellwiseoutliers

a matrix that indicates the cellwise outliers as the (scaled) difference between the original data and imputed data, both scaled and centered.

terms

the terms object used.

call

the matched call.

inputs

the list of supplied input arguments.

numloops

the number of iterations.

time

the number of seconds passed to execute the CRM algorithm.

Author(s)

Peter Filzmoser, Sebastiaan Hoppner, Irene Ortner, Sven Serneels, and Tim Verdonck

References

Filzmoser, P., Hoppner, S., Ortner, I., Serneels, S., and Verdonck, T. (2020). Cellwise Robust M regression. Computational Statistics and Data Analysis, 147, 106944. DOI:10.1016/j.csda.2020.106944

See Also

spadimo, predict.crm, cellwiseheatmap, daprpr

Examples

library(crmReg)
data(topgear)

# fit Cellwise Robust M-regression:
crmfit <- crm(formula = MPG ~ ., data = topgear)

# estimated regression coefficients and detected casewise outliers:
print(crmfit$coefficients)
print(rownames(topgear)[which(crmfit$casewiseoutliers)])

# fitted response values (MPG) versus true response values:
plot(topgear$MPG, crmfit$fitted.values, xlab = "True MPG", ylab = "Fitted MPG")
abline(a = 0, b = 1)

# residuals:
plot(crmfit$residuals, ylab = "Residuals")
text(x = which(crmfit$residuals > 30), y = crmfit$residuals[which(crmfit$residuals > 30)],
     labels = rownames(topgear)[which(crmfit$residuals > 30)], pos = 2)

print(cbind.data.frame(car = rownames(topgear),
                       MPG = topgear$MPG)[which(crmfit$residuals > 30), ])

# cellwise heatmap of casewise outliers:
cellwiseheatmap(cellwiseoutliers = crmfit$cellwiseoutliers[which(crmfit$casewiseoutliers), ],
                data = round(topgear[which(crmfit$casewiseoutliers), -7], 2),
                col.scale.factor = 1/4)
# check the plotted heatmap!

[Package crmReg version 1.0.2 Index]