sar_average {sars} | R Documentation |
Fit a multimodel averaged SAR curve
Description
Construct a multimodel averaged species-area relationship curve using information criterion weights and up to twenty SAR models.
Usage
sar_average(obj = c("power",
"powerR","epm1","epm2","p1","p2","loga","koba",
"monod","negexpo","chapman","weibull3","asymp",
"ratio","gompertz","weibull4","betap","logistic", "heleg", "linear"), data =
NULL, crit = "Info", normaTest = "none", homoTest = "none", homoCor =
"spearman", neg_check = FALSE, alpha_normtest = 0.05, alpha_homotest =
0.05, grid_start = "partial", grid_n = NULL, confInt = FALSE, ciN = 100,
verb = TRUE, display = TRUE)
Arguments
obj |
Either a vector of model names or a fit_collection object created
using |
data |
A dataset in the form of a dataframe with two columns: the first
with island/site areas, and the second with the species richness of each
island/site. If |
crit |
The criterion used to compare models and compute the model
weights. The default |
normaTest |
The test used to test the normality of the residuals of each model. Can be any of "lillie" (Lilliefors Kolmogorov-Smirnov test), "shapiro" (Shapiro-Wilk test of normality), "kolmo" (Kolmogorov-Smirnov test), or "none" (no residuals normality test is undertaken; the default). |
homoTest |
The test used to check for homogeneity of the residuals of each model. Can be any of "cor.fitted" (a correlation of the squared residuals with the model fitted values), "cor.area" (a correlation of the squared residuals with the area values), or "none" (no residuals homogeneity test is undertaken; the default). |
homoCor |
The correlation test to be used when |
neg_check |
Whether or not a check should be undertaken to flag any models that predict negative richness values. |
alpha_normtest |
The alpha value used in the residual normality test (default = 0.05, i.e. any test with a P value < 0.05 is flagged as failing the test). |
alpha_homotest |
The alpha value used in the residual homogeneity test (default = 0.05, i.e. any test with a P value < 0.05 is flagged as failing the test). |
grid_start |
Should a grid search procedure be implemented to test multiple starting parameter values. Can be one of 'none', 'partial' or 'exhaustive' The default is set to 'partial'. |
grid_n |
If |
confInt |
A logical argument specifying whether confidence intervals should be calculated for the multimodel curve using bootstrapping. |
ciN |
The number of bootstrap samples to be drawn to calculate the
confidence intervals (if |
verb |
verbose - Whether or not to print certain warnings
(default: |
display |
Show the model fitting output and related messages.
(default: |
Details
The multimodel SAR curve is constructed using information criterion
weights (see Burnham & Anderson, 2002; Guilhaumon et al. 2010). If
obj
is a vector of n model names the function fits the n models to
the dataset provided using the sar_multi
function. A dataset must
have four or more datapoints to fit the multimodel curve. If any models
cannot be fitted they are removed from the multimodel SAR. If obj
is
a fit_collection object (created using the sar_multi
function), any
model fits in the collection which are NA are removed. In addition, if any
other model checks have been selected (i.e. residual normality and
heterogeneity tests, and checks for negative predicted richness values),
these are undertaken and any model that fails the selected test(s) is
removed from the multimodel SAR. The order of the additional checks inside
the function is (if all are turned on): normality of residuals, homogeneity
of residuals, and a check for negative fitted values. Once a model fails
one test it is removed and thus is not available for further tests. Thus, a
model may fail multiple tests but the returned warning will only provide
information on a single test. We have now changed the defaults so that
no checks are undertaken, so it is up to the user to select any checks if
appropriate.
The resultant models are then used to construct the multimodel SAR curve.
For each model in turn, the model fitted values are multiplied by the
information criterion weight of that model, and the resultant values are
summed across all models (Burnham & Anderson, 2002). Confidence intervals
can be calculated (using confInt
) around the multimodel averaged
curve using the bootstrap procedure outlined in Guilhaumon et al (2010).The
procedure transforms the residuals from the individual model fits and
occasionally NAs / Inf values can be produced - in these cases, the model
is removed from the confidence interval calculation (but not the multimodel
curve itself). There is also a constraint within the procedure to remove
any transformed residuals that result in negative richness values. When
several SAR models are used, when grid_start is turned on and when the
number of bootstraps (ciN
) is large, generating the confidence
intervals can take a (very) long time. Parallel processing will be added to
future versions.
Choosing starting parameter values for non-linear regression optimisation
algorithms is not always straight forward, depending on the data at hand.
In the package, we use various approaches to choose default starting
parameters. However, we also use a grid search process which creates a
large array of different possible starting parameter values (within certain
bounds) and then randomly selects a proportion of these to test. There are
three options for the grid_start
argument to control this process.
The default (grid_start = "partial"
) randomly samples 500 different
sets of starting parameter values for each model, adds these to the model's
default starting values and tests all of these. A more comprehensive set of
starting parameter estimates can be used (grid_start = "exhaustive"
)
- this option allows the user to choose the number of starting parameter
sets to be tested (using the grid_n
argument) and includes a range
of additional starting parameter estimates, e.g. very small values and
particular values we have found to be useful for individual models. Using
grid_start = "exhaustive"
in combination with a large grid_n
can be very time consuming; however, we would recommend it as it makes it
more likely that the optimal model fit will be found, particularly for the
more complex models. This is particularly true if any of the model fits
does not converge, returns a singular gradient at parameter estimates, or
the plot of the model fit does not look optimum. The grid start procedure
can also be turned off (grid_start = "none"
), meaning just the
default starting parameter estimates are used. Note that grid_start
has been disabled for a small number of models (e.g. Weibull 3 par.). See
the vignette for more information. Remember that, as grid_start has a
random component, when grid_start != "none"
, you can get slightly
different results each time you fit a model or run sar_average
.
Even with grid_start, occasionally a model fit will be able to be fitted
and pass the model fitting checks (e.g. residual normality) but the
resulting fit is nonsensical (e.g. a horizontal line with intercept at
zero). Thus, it can be useful to plot the resultant 'multi' object to check
the individual model fits. To re-run the sar_average
function
without a particular model, simply remove it from the obj
argument.
The sar_models()
function can be used to bring up a list of the 20
model names. display_sars_models()
generates a table of the 20
models with model information.
Value
A list of class "multi" and class "sars" with two elements. The first element ('mmi') contains the fitted values of the multimodel sar curve. The second element ('details') is a list with the following components:
mod_names Names of the models that were successfully fitted and passed any model check
fits A fit_collection object containing the successful model fits
ic The information criterion selected
norm_test The residual normality test selected
homo_test The residual homogeneity test selected
alpha_norm_test The alpha value used in the residual normality test
alpha_homo_test The alpha value used in the residual homogeneity test
ics The information criterion values (e.g. AIC values) of the model fits
delta_ics The delta information criterion values
weights_ics The information criterion weights of each model fit
n_points Number of data points
n_mods The number of successfully fitted models
no_fit Names of the models which could not be fitted or did not pass model checks
convergence Logical value indicating whether
optim
model convergence code = 0, for each model
The summary.sars
function returns a more useful summary of
the model fit results, and the plot.multi
plots the
multimodel curve.
Note
There are different types of non-convergence and these are dealt with
differently in the package. If the optimisation algorithm fails to return
any solution, the model fit is defined as NA and is then removed, and so
does not appear in the model summary table or multi-model curve etc.
However, the optimisation algorithm (e.g. Nelder-Mead) can also return
non-NA model fits but where the solution is potentially non-optimal (e.g.
degeneracy of the Nelder–Mead simplex) - these cases are identified by any
optim
convergence code that is not zero. We have decided not
to remove these fits (i.e. they are kept in the model summary table and
multimodel curve) - as arguably a non-optimal fit is still better than no
fit - but any instances can be checked using the returned
details$converged
vector and then the model fitting re-run without
these models, if preferred. Increasing the starting parameters grid search
(see above) may also help avoid this issue.
The generation of confidence intervals around the multimodel curve (using
confInt == TRUE
), may throw up errors that we have yet to come
across. Please report any issues to the package maintainer.
There are different formulas for calculating the various information criteria (IC) used for model comparison (e.g. AIC, BIC). For example, some formulas use the residual sum of squares (rss) and others the log-likelihood (ll). Both are valid approaches and will give the same parameter estimates, but it is important to only compare IC values that have been calculated using the same approach. For example, the 'sars' package used to use formulas based on the rss, while the nls function function in the stats package uses formulas based on the ll. To increase the compatibility between nls and sars, we have changed our formulas such that now our IC formulas are the same as those used in the nls function. See the "On the calculation of information criteria" section in the package vignette for more information.
The mmf model was found to be equivalent to the He & Legendre logistic, and
so the former has been deprecated (as of Feb 2021). We have removed it from
the default models in sar_average
, although it is still available to
be used for the time being (using the obj
argument). The standard
logistic model has been added in its place, and is now used as default
within sar_average
.
References
Burnham, K. P., & Anderson, D. R. (2002). Model selection and multi-model inference: a practical information-theoretic approach (2nd ed.). New-York: Springer.
Guilhaumon, F., Mouillot, D., & Gimenez, O. (2010). mmSAR: an R-package for multimodel species-area relationship inference. Ecography, 33, 420-424.
Matthews, T. J., K. A. Triantis, R. J. Whittaker, & F. Guilhaumon. (2019). sars: an R package for fitting, evaluating and comparing species–area relationship models. Ecography, 42, 1446–55.
Examples
data(galap)
#attempt to construct a multimodel SAR curve using all twenty sar models
#using no grid_start just for speed here (not recommended generally)
fit <- sar_average(data = galap, grid_start = "none")
summary(fit)
plot(fit)
# construct a multimodel SAR curve using a fit_collection object
ff <- sar_multi(galap, obj = c("power", "loga", "monod", "weibull3"))
fit2 <- sar_average(obj = ff, data = NULL)
summary(fit2)
## Not run:
# construct a multimodel SAR curve using a more exhaustive set of starting
# parameter values
fit3 <- sar_average(data = galap, grid_start = "exhaustive", grid_n = 1000)
## End(Not run)