R: Detection of outliers in benchmark models

outlier.ap {Benchmarking}

R Documentation

Detection of outliers in benchmark models

Description

The functions implements the Wilson (1993) outlier detection method. One written entirely in R and another written in C++.

Usage

outlier.ap (X, Y, NDEL = 3, NLEN = 25, TRANSPOSE = FALSE)
outlierC.ap(X, Y, NDEL = 3, NLEN = 25, TRANSPOSE = FALSE)

outlier.ap.plot(ratio, NLEN = 25, xlab = "Number of firms deleted", 
                ylab = "Log ratio", ..., ylim)

Arguments

`X`	Input as a firms times goods matrix, see `TRANSPOSE`.
`Y`	Output as a firms times goods matrix, see `TRANSPOSE`.
`NDEL`	The maximum number of firms to be considered as a group of outliers, i.e. the maximum number of firms to be deleted.
`NLEN`	The number of ratios to save for each level or removal, the number of rows in `ratio` used.
`TRANSPOSE`	Input and output matrices are treated as firms times goods matrices for the default value `TRANSPOSE=FALSE` corresponding to the standard in R for statistical models. When `TRUE` data matrices are transposed to good times firms matrices as is normally used in LP formulation of the problem.
`ratio`	The `ratio` component from the list as output from `outlier.ap`.
`xlab`	Label for the x-axis.
`ylab`	Label for the y-axis
`ylim`	The y limits `(y1, y2)` of the plot, an array/vector of length 2.
`...`	Usual options for the methods `plot` and `lines`.

Details

An implementation of the method in Wilson (1993) using only R functions and especially the function det to calculate R^{(i)}_{\min}. The alternative method outlierC.ap is written completely in C++ and is much faster, but still not as fast at the method in FEAR.

An elementary presentation of the method is found in Bogetoft and Otto (2011), Sect. 5.13 on outliers.

For a data set with 10 firms and considering at the most 3 outliers there are 175 combinations of firms to delete. For 100 firms there are 166,750 combinations and for at most 5 outliers there are 79,375,495 combinatins, for at most 8 outliers there are 203,366,882,995 combinations. For 200 firms whith respectively 3,5 and 8 outliers there are 1,333,500, and 2,601,668,490, and a number we do not know what to call 57,467,902,686,615 combinations. Thus the number of combinations are increasing exponentialy in both number of firms and number of firms to be deleted and so is the computational time. Thus you should limit the numbers NDEL to a very small number like at the most 3 or perhabs 5 depending of the number of firms. Or you should use the extremely fast method ap from the package FEAR mentioned in the references.

Value

`ratio`	A `min(NLEN,K) x NDEL` matrix with the log-ratios to be plotted.
`imat`	A `NDEL x NDEL` matrix with indicies for deleted firms.
`r0`	A `NDEL` array with the minimum value `R^{i}` of the for each number of deleted firms.

Note

The function outlier.ap is extremely slow and for NDEL larger than 3 or 4 it might be advisable to use the function ap from the package FEAR.

The name of the returned components are the same as for ap in the package FEAR.

Author(s)

Peter Bogetoft and Lars Otto larsot23@gmail.com

References

Bogetoft and Otto; Benchmarking with DEA, SFA, and R; Springer 2011

Wilson (1993), “Detecing outliers in deterministic nonparametric frontier models with multiple outputs,” Journal of Business and Economic Statistics 11, 319-323.

Wilson (2008), “FEAR 1.0: A Software Package for Frontier Efficiency Analysis with R,” Socio-Economic Planning Sciences 42, 247–254

Examples

n <- 25
x <- matrix(rnorm(n))
y <- .5 + 2.5*x + 2*rnorm(25)
tap <- outlier.ap(x,y, NDEL=2)
print(cbind(tap$imat,tap$rmin), na.print="", digit=2)
outlier.ap.plot(tap$ratio)

[Package Benchmarking version 0.32 Index]