R: Bootstrap estimation and errors

bootstrap {analogue}

R Documentation

Bootstrap estimation and errors

Description

Function to calculate bootstrap statistics for transfer function models such as bootstrap estimates, model RMSEP, sample specific errors for predictions and summary statistics such as bias and R^2 between oberved and estimated environment.

residuals method for objects of class "bootstrap.mat".

Usage


bootstrap(object, ...)

## Default S3 method:
bootstrap(object, ...)

## S3 method for class 'mat'
bootstrap(object, newdata, newenv, k,
          weighted = FALSE, n.boot = 1000, ...)

## S3 method for class 'bootstrap.mat'
fitted(object, k, ...)

## S3 method for class 'bootstrap.mat'
residuals(object, which = c("model", "bootstrap"), ...)

Arguments

`object`	an R object of class `"mat"` for which bootstrap statistics are to be generated, or an object of class `"bootstrap.mat"` from which fitted values or residuals are extracted.
`newdata`	a data frame containing samples for which bootstrap predictions and sample specific errors are to be generated. May be missing — See Details. `"newdata"` must have the same number of columns as the training set data.
`newenv`	a vector containing environmental data for samples in `"newdata"`. Used to calculate full suite of errors for new data such as a test set with known environmental values. May be missing — See Details. `"newenv"` must have the same number of rows as `"newdata"`.
`k`	numeric; how many modern analogues to use to generate the bootstrap statistics (and, if requested, the predictions), fitted values or residuals.
`weighted`	logical; should the weighted mean of the environment for the `"k"` modern analogues be used instead of the mean?
`n.boot`	Number of bootstrap samples to take.
`which`	character; which set of residuals to return, the model residuals or the residuals of the bootstrap-derived estimates?
`...`	arguments passed to other methods.

Details

bootstrap is a fairly flexible function, and can be called with or without arguments newdata and newenv.

If called with only object specified, then bootstrap estimates for the training set data are returned. In this case, the returned object will not include component predictions.

If called with both object and newdata, then in addition to the above, bootstrap estimates for the new samples are also calculated and returned. In this case, component predictions will contain the apparent and bootstrap derived predictions and sample-specific errors for the new samples.

If called with object, newdata and newenv, then the full bootstrap object is returned (as described in the Value section below). With environmental data now available for the new samples, residuals, RMSE(P) and R^2 and bias statistics can be calculated.

The individual components of predictions are the same as those described in the components relating to the training set data. For example, returned.object$predictions$bootstrap contains the components as returned.object$bootstrap.

It is not usual for environmental data to be available for the new samples for which predictions are required. In normal palaeolimnological studies, it is more likely that newenv will not be available as we are dealing with sediment core samples from the past for which environmental data are not available. However, if sufficient training set samples are available to justify producing a training and a test set, then newenv will be available, and bootstrap can accomodate this extra information and calculate apparent and bootstrap estimates for the test set, allowing an independent assessment of the RMSEP of the model to be performed.

Typical usage of residuals is

    resid(object, which = c("model", "bootstrap"), \dots)

Value

For bootstrap.mat an object of class "bootstrap.mat" is returned. This is a complex object with many components and is described in bootstrapObject.

For residuals, a list containg the requested residuals and metadata, with the following components:

`model`	Leave one out residuals for the MAT-estimated model.
`bootstrap`	residuals for the bootstrapped MAT model.
`k`	numeric; indicating the size of model used in estimates and predictions.
`n.boot`	numeric; the number of bootstrap samples taken.
`auto`	logical; whether `"k"` was choosen automatically or user-selected.
`weighted`	logical; whether the weighted mean was used instead of the mean of the environment for k-closest analogues.

Author(s)

Gavin L. Simpson

References

Birks, H.J.B., Line, J.M., Juggins, S., Stevenson, A.C. and ter Braak, C.J.F. (1990). Diatoms and pH reconstruction. Philosophical Transactions of the Royal Society of London; Series B, 327; 263–278.

Examples


## Imbrie and Kipp example
## load the example data
data(ImbrieKipp)
data(SumSST)
data(V12.122)

## merge training and test set on columns
dat <- join(ImbrieKipp, V12.122, verbose = TRUE)

## extract the merged data sets and convert to proportions
ImbrieKipp <- dat[[1]] / 100
V12.122 <- dat[[2]] / 100

## Imbrie and Kipp foraminfera sea-surface temperature 
## fit the MAT model using the squared chord distance measure
ik.mat <- mat(ImbrieKipp, SumSST, method = "SQchord")

## bootstrap training set
## IGNORE_RDIFF_BEGIN
ik.boot <- bootstrap(ik.mat, n.boot = 100)
ik.boot
summary(ik.boot)
## IGNORE_RDIFF_END

## Bootstrap fitted values for training set
## IGNORE_RDIFF_BEGIN
fitted(ik.boot)
## IGNORE_RDIFF_END

## residuals
resid(ik.boot) # uses abbreviated form

[Package analogue version 0.17-6 Index]