R: Resampling-based prediction error for a sequential regression...

perry.seqModel {robustHD}

R Documentation

Resampling-based prediction error for a sequential regression model

Description

Estimate the prediction error of a previously fit sequential regression model such as a robust least angle regression model or a sparse least trimmed squares regression model.

Usage

## S3 method for class 'seqModel'
perry(
  object,
  splits = foldControl(),
  cost,
  ncores = 1,
  cl = NULL,
  seed = NULL,
  ...
)

## S3 method for class 'sparseLTS'
perry(
  object,
  splits = foldControl(),
  fit = c("reweighted", "raw", "both"),
  cost = rtmspe,
  ncores = 1,
  cl = NULL,
  seed = NULL,
  ...
)

Arguments

`object`	the model fit for which to estimate the prediction error.
`splits`	an object of class `"cvFolds"` (as returned by `cvFolds`) or a control object of class `"foldControl"` (see `foldControl`) defining the folds of the data for (repeated) `K`-fold cross-validation, an object of class `"randomSplits"` (as returned by `randomSplits`) or a control object of class `"splitControl"` (see `splitControl`) defining random data splits, or an object of class `"bootSamples"` (as returned by `bootSamples`) or a control object of class `"bootControl"` (see `bootControl`) defining bootstrap samples.
`cost`	a cost function measuring prediction loss. It should expect vectors to be passed as its first two arguments, the first corresponding to the observed values of the response and the second to the predicted values, and must return a non-negative scalar value. The default is to use the root mean squared prediction error for non-robust models and the root trimmed mean squared prediction error for robust models (see `cost`).
`ncores`	a positive integer giving the number of processor cores to be used for parallel computing (the default is 1 for no parallelization). If this is set to `NA`, all available processor cores are used.
`cl`	a parallel cluster for parallel computing as generated by `makeCluster`. If supplied, this is preferred over `ncores`.
`seed`	optional initial seed for the random number generator (see `.Random.seed`). Note that also in case of parallel computing, resampling is performed on the manager process rather than the worker processes. On the parallel worker processes, random number streams are used and the seed is set via `clusterSetRNGStream`.
`...`	additional arguments to be passed to the prediction loss function `cost`.
`fit`	a character string specifying for which fit to estimate the prediction error. Possible values are `"reweighted"` (the default) for the prediction error of the reweighted fit, `"raw"` for the prediction error of the raw fit, or `"both"` for the prediction error of both fits.

Details

The prediction error can be estimated via (repeated) K-fold cross-validation, (repeated) random splitting (also known as random subsampling or Monte Carlo cross-validation), or the bootstrap. In each iteration, the optimal model is thereby selected from the training data and used to make predictions for the test data.

Value

An object of class "perry" with the following components:

pe: a numeric vector containing the estimated prediction errors for the requested model fits. In case of more than one replication, this gives the average value over all replications.
se: a numeric vector containing the estimated standard errors of the prediction loss for the requested model fits.
reps: a numeric matrix in which each column contains the estimated prediction errors from all replications for the requested model fits. This is only returned in case of more than one replication.
splits: an object giving the data splits used to estimate the prediction error.
y: the response.
yHat: a list containing the predicted values from all replications.
call: the matched function call.

Author(s)

Andreas Alfons

Examples

## generate data
# example is not high-dimensional to keep computation time low
library("mvtnorm")
set.seed(1234)  # for reproducibility
n <- 100  # number of observations
p <- 25   # number of variables
beta <- rep.int(c(1, 0), c(5, p-5))  # coefficients
sigma <- 0.5      # controls signal-to-noise ratio
epsilon <- 0.1    # contamination level
Sigma <- 0.5^t(sapply(1:p, function(i, j) abs(i-j), 1:p))
x <- rmvnorm(n, sigma=Sigma)    # predictor matrix
e <- rnorm(n)                   # error terms
i <- 1:ceiling(epsilon*n)       # observations to be contaminated
e[i] <- e[i] + 5                # vertical outliers
y <- c(x %*% beta + sigma * e)  # response
x[i,] <- x[i,] + 5              # bad leverage points


## fit and evaluate robust LARS model
fitRlars <- rlars(x, y, sMax = 10)
perry(fitRlars)

## fit and evaluate sparse LTS model
frac <- seq(0.2, 0.05, by = -0.05)
fitSparseLTS <- sparseLTS(x, y, lambda = frac, mode = "fraction")
perry(fitSparseLTS)

[Package robustHD version 0.8.1 Index]