robustnessInCompositions {compositions} | R Documentation |

## Handling robustness issues and outliers in compositions.

### Description

The seamless transition to robust estimations in library(compositions).

### Details

A statistical method is called nonrobust if an arbitrary contamination of
a small portion of the dataset can produce results radically different
from the results without the contamination. In this sense many
classical procedures relying on distributional models or on moments
like mean and variance are highly nonrobust.

We consider robustness as an essential prerequierement of all
statistical analysis. However in the context of compositional data
analysis robustness is still in its first years.

As of Mai 2008 we provide a new approach to robustness in the
package. The central idea is that robustness should be more or less
automatic and that there should be no necessity to change the code to
compare results obtained from robust procedures and results from there
more efficient nonrobust counterparts.

To achieve this all routines that rely on distributional models (such
as e.g. mean,
variance, principle component analysis, scaling) and routines relying
on those routines get a new standard argument of the form:

`fkt(...,robust=getOption("robust"))`

which defaults to a new option "robust". This option can take several
values:

- FALSE
The classical estimators such as arithmetic mean and persons product moment variance are used and the results are to be considered nonrobust.

- TRUE
The default for robust estimation in the package is used. At this time this is

`covMcd`

in the robustbase-package. This default might change in future.- "pearson"
This is a synonym for FALSE and explicitly states that no robustness should be used.

- "mcd"
Minimum Covariance Determinant. This option explicitly selects the use of

`covMcd`

in the robustbase-package as the main robustness engine.

More options might follow later.
To control specific parameters of the
model the string can get an attribute named "control" which contains
additional options for the robustness engine used. In this moment the
control attribute of mcd is a control object of
`covMcd`

. The control argument of "pearson" is a list
containing addition options to the mean, like trim.

The standard value for getOption("robust") is FALSE to avoid situation
in which the user thinks he uses a classical technique. Robustness
must be switched on explicitly. Either by setting the option with
`options(robust=TRUE)`

or by giving the argument. This default
might change later if the authors come to the impression that robust
estimation is now considered to be the default.

For those not only interested in avoiding the influence of the outliers, but in an analysis of the outliers we added a subsystem for outlier classification. This subsystem is described in outliersInCompositions and also relies on the robust option. However evidently for these routines the factory default for the robust option is always TRUE, because it is only applicable in an outlieraware context.

We hope that in this way we can provide a seamless transition from nonrobust analysis to a robust analysis.

### Note

IMPORTANT: The robust argument only works with the classes of the
package. Only your compositional analysis is suddenly robust.

The package robustbase is required for using the
robust estimations and the outlier subsystem of compositions. To
simplify installation it is not listed as required, but it will be
loaded, whenever any sort of outlierdetection or robust estimation is
used.

### Author(s)

K.Gerald v.d. Boogaart http://www.stat.boogaart.de

### See Also

`var.acomp`

, `mean.acomp`

,
robustbase, compositions-package,
missings, `outlierplot`

,
`OutlierClassifier1`

, `ClusterFinder1`

### Examples

```
A <- matrix(c(0.1,0.2,0.3,0.1),nrow=2)
Mvar <- 0.1*ilrvar2clr(A%*%t(A))
Mcenter <- acomp(c(1,2,1))
typicalData <- rnorm.acomp(100,Mcenter,Mvar) # main population
colnames(typicalData)<-c("A","B","C")
data5 <- acomp(rbind(unclass(typicalData)+outer(rbinom(100,1,p=0.1)*runif(100),c(0.1,1,2))))
mean(data5)
mean(data5,robust=TRUE)
var(data5)
var(data5,robust=TRUE)
Mvar
biplot(princomp(data5))
biplot(princomp(data5,robust=TRUE))
```

*compositions*version 2.0-8 Index]