MAT {rioja} | R Documentation |
Palaeoenvironmental reconstruction using the Modern Analogue Technique (MAT)
Description
Functions for reconstructing (predicting) environmental values from biological assemblages using the Modern Analogue Technique (MAT), also know as k nearest neighbours (k-NN).
Usage
MAT(y, x, dist.method="sq.chord", k=5, lean=TRUE)
## S3 method for class 'MAT'
predict(object, newdata=NULL, k=object$k, sse=FALSE,
nboot=100, match.data=TRUE, verbose=TRUE, lean=TRUE,
...)
## S3 method for class 'MAT'
performance(object, ...)
## S3 method for class 'MAT'
crossval(object, k=object$k, cv.method="lgo",
verbose=TRUE, ngroups=10, nboot=100, h.cutoff=0, h.dist=NULL, ...)
## S3 method for class 'MAT'
print(x, ...)
## S3 method for class 'MAT'
summary(object, full=FALSE, ...)
## S3 method for class 'MAT'
plot(x, resid=FALSE, xval=FALSE, k=5, wMean=FALSE, xlab="",
ylab="", ylim=NULL, xlim=NULL, add.ref=TRUE,
add.smooth=FALSE, ...)
## S3 method for class 'MAT'
residuals(object, cv=FALSE, ...)
## S3 method for class 'MAT'
fitted(object, ...)
## S3 method for class 'MAT'
screeplot(x, ...)
paldist(y, dist.method="sq.chord")
paldist2(y1, y2, dist.method="sq.chord")
Arguments
y , y1 , y2 |
data frame containing biological data. |
newdata |
data frame containing biological data to predict from. |
x |
a vector of environmental values to be modelled, matched to y. |
dist.method |
dissimilarity coefficient. See details for options. |
match.data |
logical indicate the function will match two species datasets by their column names. You should only set this to |
k |
number of analogues to use. |
lean |
logical to remove items form the output. |
object |
an object of class |
resid |
logical to plot residuals instead of fitted values. |
xval |
logical to plot cross-validation estimates. |
wMean |
logical to plot weighted-mean estimates. |
xlab , ylab , xlim , ylim |
additional graphical arguments to |
add.ref |
add 1:1 line on plot. |
add.smooth |
add loess smooth to plot. |
cv.method |
cross-validation method, either "lgo", "bootstrap" or "h-block". |
verbose |
logical to show feedback during cross-validation. |
nboot |
number of bootstrap samples. |
ngroups |
number of groups in leave-group-out cross-validation, or a vector contain leave-out group menbership. |
h.cutoff |
cutoff for h-block cross-validation. Only training samples greater than |
h.dist |
distance matrix for use in h-block cross-validation. Usually a matrix of geographical distances between samples. |
sse |
logical indicating that sample specific errors should be calculated. |
full |
logical to indicate a full or abbreviated summary. |
cv |
logical to indicate model or cross-validation residuals. |
... |
additional arguments. |
Details
MAT
performs an environmental reconstruction using the modern analogue technique. Function MAT
takes a training dataset of biological data (species abundances) y
and a single associated environmental variable x
, and generates a model of closest analogues, or matches, for the modern data data using one of a number of dissimilarity coefficients. Options for the latter are: "euclidean", "sq.euclidean", "chord", "sq.chord", "chord.t", "sq.chord.t", "chi.squared", "sq.chi.squared", "bray". "chord.t" are true chord distances, "chord" refers to the the variant of chord distance using in palaeoecology (e.g. Overpeck et al. 1985), which is actually Hellinger's distance (Legendre & Gallagher 2001). There are various help functions to plot and extract information from the results of a MAT
transfer function. The function predict
takes MAT
object and uses it to predict environmental values for a new set of species data, or returns the fitted (predicted) values from the original modern dataset if newdata
is NULL
. Variables are matched between training and newdata by column name (if match.data
is TRUE
). Use compare.datasets
to assess conformity of two species datasets and identify possible no-analogue samples.
MAT
has methods fitted
and rediduals
that return the fitted values (estimates) and residuals for the training set, performance
, which returns summary performance statistics (see below), and print
and summary
to summarise the output. MAT
also has a plot
method that produces scatter plots of predicted vs observed measurements for the training set.
Function screeplot
displays the RMSE of prediction for the training set as a function of the number of analogues (k) and is useful for estimating the optimal value of k for use in prediction.
paldist
and paldist1
are helper functions though they may be called directly. paldist
takes a single data frame or matrix returns a distance matrix of the row-wise dissimilarities. paldist2
takes two data frames of matrices and returns a matrix of all row-wise dissimilarities between the two datasets.
Value
Function MAT
returns an object of class MAT
which contains the following items:
call |
original function call to |
fitted.vales |
fitted (predicted) values for the training set, as the mean and weighted mean (weighed by dissimilarity) of the k closest analogues. |
diagnostics |
standard deviation of the k analogues and dissimilarity of the closest analogue. |
dist.n |
dissimilarities of the k closest analogues. |
x.n |
environmental values of the k closest analogues. |
match.name |
column names of the k closest analogues. |
x |
environmental variable used in the model. |
dist.method |
dissimilarity coefficient. |
k |
number of closest analogues to use. |
y |
original species data. |
cv.summary |
summary of the cross-validation (not yet implemented). |
dist |
dissimilarity matrix (returned if |
If function predict
is called with newdata=NULL
it returns a matrix of fitted values from the original training set analysis. If newdata
is not NULL
it returns list with the following named elements:
fit |
predictions for |
diagnostics |
standard deviations of the k closest analogues and distance of closest analogue. |
dist.n |
dissimilarities of the k closest analogues. |
x.n |
environmental values of the k closest analogues. |
match.name |
column names of the k closest analogues. |
dist |
dissimilarity matrix (returned if |
If sample specific errors were requested the list will also include:
fit.boot |
mean of the bootstrap estimates of newdata. |
v1 |
standard error of the bootstrap estimates for each new sample. |
v2 |
root mean squared error for the training set samples, across all bootstram samples. |
SEP |
standard error of prediction, calculated as the square root of v1^2 + v2^2. |
Functions paldist
and paldist2
return dissimilarity matrices. performance
returns a matrix of performance statistics for the MAT model, with columns for RMSE, R2, mean and max bias for each number of analogues up to k. See performance
for a description of the output.
Author(s)
Steve Juggins
References
Legendre, P. & Gallagher, E. (2001) Ecologically meaningful transformations for ordination of species. Oecologia, 129, 271-280.
Overpeck, J.T., Webb, T., III, & Prentice, I.C. (1985) Quantitative interpretation of fossil pollen spectra: dissimilarity coefficients and the method of modern analogs. Quaternary Research, 23, 87-108.
See Also
WAPLS
, WA
, performance
, and compare.datasets
for diagnostics.
Examples
# pH reconstruction of the RLGH, Scotland, using SWAP training set
# shows recent acidification history
data(SWAP)
data(RLGH)
fit <- MAT(SWAP$spec, SWAP$pH, k=20) # generate results for k 1-20
#examine performance
performance(fit)
print(fit)
# How many analogues?
screeplot(fit)
# do the reconstruction
pred.mat <- predict(fit, RLGH$spec, k=10)
# plot the reconstruction
plot(RLGH$depths$Age, pred.mat$fit[, 1], type="b", ylab="pH", xlab="Age")
#compare to a weighted average model
fit <- WA(SWAP$spec, SWAP$pH)
pred.wa <- predict(fit, RLGH$spec)
points(RLGH$depths$Age, pred.wa$fit[, 1], col="red", type="b")
legend("topleft", c("MAT", "WA"), lty=1, col=c("black", "red"))