R: Regression for Multiple Comparison with the Best

rmcb {greybox}

R Documentation

Regression for Multiple Comparison with the Best

Description

RMCB stands for "Regression for Multiple Comparison with the Best", referring to the comparison of forecasting methods. This is a regression-based version of the Nemenyi / MCB test relies on the ranks of variables. This test is based on Nemenyi / MCB test (Demsar, 2006). It transforms the data into ranks and then constructs a regression on them of the type:

Usage

rmcb(data, level = 0.95, outplot = c("mcb", "lines", "none"),
  select = NULL, ...)

## S3 method for class 'rmcb'
plot(x, outplot = c("mcb", "lines"), select = NULL, ...)

Arguments

`data`	Matrix or data frame with observations in rows and variables in columns.
`level`	The width of the confidence interval. Default is 0.95.
`outplot`	What type of plot to use after the calculations. This can be either "MCB" (`"mcb"`), or "Vertical lines" (`"lines"`), or nothing (`"none"`). You can also use plot method on the produced object in order to get the same effect.
`select`	What column of data to highlight on the plot. If NULL, then the method with the lowest value is selected.
`...`	Other parameters passed to rank function.
`x`	The produced rmcb model.

Details

y = b' X + e,

where y is the vector of the ranks of provided data (as.vector(data)), X is the matrix of dummy variables for each column of the data (forecasting method), b is the vector of coefficients for the dummies and e is the error term of the model. Given that the data is ranked, it test the differences in medians between the methods and then produces plots based on that.

There is also a plot() method that allows producing either "mcb" or "lines" style of plot. This can be regulated via plot(x, outplot="lines").

Value

If outplot!="none", then the function plots the results after all the calculations using plot.rmcb() function.

Function returns a list of a class "rmcb", which contains the following variables:

mean Mean values for each method.
interval Confidence intervals for each method.
vlines Coordinates used for outplot="l", marking the groups of methods.
groups The table containing the groups. TRUE - methods are in the same group, FALSE - they are not.
methods Similar to group parameter, but with a slightly different presentation.
p.value p-value for the test of the significance of the model. This is the value from the F test of the linear regression.
level Confidence level.
model lm model produced for the calculation of the intervals.
outplot Style of the plot to produce.
select The selected variable to highlight.

Author(s)

Ivan Svetunkov, ivan@svetunkov.ru

References

Demsar, J. (2006). Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research, 7, 1-30. https://www.jmlr.org/papers/volume7/demsar06a/demsar06a.pdf

Examples

N <- 50
M <- 4
ourData <- matrix(rnorm(N*M,mean=0,sd=1), N, M)
ourData[,2] <- ourData[,2]+4
ourData[,3] <- ourData[,3]+3
ourData[,4] <- ourData[,4]+2
colnames(ourData) <- c("Method A","Method B","Method C - long name","Method D")
ourTest <- rmcb(ourData, level=0.95)

# See the mean ranks:
ourTest$mean
# The same is for the intervals:
ourTest$interval

# You can also reproduce plots in different styles:
plot(ourTest, outplot="lines")

# Or you can use the default "mcb" style and set additional parameters for the plot():
par(mar=c(2,2,4,0)+0.1)
plot(ourTest, main="Four methods")

[Package greybox version 2.0.1 Index]