COBRA {COBRA}  R Documentation 
The function COBRA delivers prediction outcomes for a testing sample on
the basis of a training sample and a bunch of basic regression
machines. By default, those machines are wrappers to the R packages
lars
, ridge
, tree
and
randomForest
, covering a somewhat wide spectrum in contemporary
prediction methods for regression. However the most interesting way to use COBRA
is to use any regression method suggested by the context (see argument machines
). COBRA may natively parallelize the computations (use option parallel
).
COBRA(train.design, train.responses, split, test, machines, machines.names, logGrid = FALSE, grid = 200, alpha.machines, parallel = FALSE, nb.cpus = 2, plots = FALSE, savePlots = FALSE, logs = FALSE, progress = TRUE, path = "")
train.design 
Mandatory. The design matrix for the training sample. 
train.responses 
Mandatory. The responses vector for the training sample. 
split 
Optional. How should COBRA cut the training sample? 
test 
Mandatory. The design matrix of the testing sample. 
machines 
Optional. Regression basic machines provided by the user. This should be a matrix, whose number of rows is the length of the training sample (ntrain) plus the length of the testing sample (ntest), and with as many columns as machines. Element (i,j) of this matrix is assumed to be r_j(X_i), the (scalar) prediction of machine j for query point X_i, where i is from 1 to ntrain+ntest. 
machines.names 
Optional. If 
logGrid 
Optional. If 
grid 
Optional. How many points should be used in the discretization scheme for calibrating the parameter epsilon. 
alpha.machines 
Optional. Coerce COBRA to use exactly

parallel 
Optional. If 
nb.cpus 
Optional. If 
plots 
Optional. If 
savePlots 
Optional. If 
logs 
Optional. If 
progress 
Optional. If 
path 
Optional. If 
For most users, options grid
and split
should be set to
their default values.
Returns a list including only
predict 
The vector of predicted values. 
Caution: If your data is ordered, you should shuffle the observations before calling COBRA since the algorithm assumes all data points are independent and identically distributed.
Benjamin Guedj <benjamin.guedj@upmc.fr>
http://www.lsta.upmc.fr/doct/guedj/index.html
G. Biau, A. Fischer, B. Guedj and J. D. Malley (2013), COBRA: A Nonlinear Aggregation Strategy. http://arxiv.org/abs/1303.2236 and http://hal.archivesouvertes.fr/hal00798579
COBRApackage
n < 500 d < 30 ntrain < 400 X < replicate(d,2*runif(n = n)1) Y < X[,1]^2 + X[,3]^3 + exp(X[,10]) + rnorm(n = n, sd = .1) train.design < as.matrix(X[1:ntrain,]) train.responses < Y[1:ntrain] test < as.matrix(X[(1:ntrain),]) test.responses < Y[(1:ntrain)] ## using the default machines if(require(lars) && require(tree) && require(ridge) && require(randomForest)) { res < COBRA(train.design = train.design, train.responses = train.responses, test = test) print(cbind(res$predict,test.responses)) plot(test.responses,res$predict,xlab="Responses",ylab="Predictions",pch=3,col=2) abline(0,1,lty=2) } ## using own machines machines.names < c("Soothsayer","Dummy") machines < matrix(nr = n, nc = 2, data = 0) machines[,1] < Y+rnorm(n = n, sd=.1) ## soothsayer machines[,2] < mean(train.responses) ## dummy prediction, averaging train.responses res2 < COBRA(train.design = train.design, train.responses = train.responses, test = test, machines = machines, machines.names = machines.names) print(cbind(res2$predict,test.responses)) plot(test.responses,res2$predict,xlab="Responses",ylab="Predictions",pch=3,col=2) abline(0,1,lty=2)