modelFilter {dwp} | R Documentation |
Run Models through a Sieve to Filter out Dubious Fits
Description
A set of fitted models (ddArray
) is filtered according
to a set of criteria that test for high AIC, high-influence points, and
plausibility of the tail probabilities of each fitted distribution.
modelFilter
will either auto-select the best model according to a set
of pre-defined, objective criteria or will will return all models that meet
a set of user-defined, or default criteria. A table of how the models
score according to each criterion is printed to the console.
Usage
modelFilter(dmod, sieve = "default", quiet = FALSE)
Arguments
dmod |
a |
sieve |
a list of criteria for ordering models |
quiet |
boolean to suppress ( |
Details
The criteria to test are entered in a list (sieve
) with components:
-
$rtail
= vector of probabilities that define a checkpoints on distributions to avoid situations where a model that may fit well within the range of data is nonetheless implausible because it predicts a significant or substantial probability of carcasses falling great distances from the nearest turbine. The default is to check whether or not a distribution predicts that less than 50% of carcasses fall within 80 meters, 90% within 120 meters, 95% within 150 meters, or 99% within 200 meters. Distributions that fall below any of these points (for example predicting only 42% within 80 meters or only 74% within 120 meters) fail the defaultrtail
test. The format of the default for the test is$rtail = c(p80 = 0.5, p120 = 0.90, p150 = 0.95, p200 = 0.99)
. Users may override the default by using, for example,sieve = list(rtail = c(p80 = 0.8, p120 = 0.99, p150 = 0.99, p200 = 0.999))
in the argument list for a more stringent test or for a situation where turbines are small or winds are light. Alternatively, users may forego the test altogether by enteringsieve = list(rtail = FALSE)
. If specific probabilities are provided, they must be in a vector of length 4 with names "p80
" etc. as in the examples above. -
$ltail
= vector of probabilities that define checkpoints on distributions to avoid situations where the search radius is short and a distribution that fits the limited data set well but crashes to zero just outside the search radius. The default is to check whether or not a distribution predicts that greater than 50% of carcasses fall with 20 meters or 90% within 50 meters. Distributions that pass above either of these checkpoints (for example predicting 61% of carcasses within 20 meters or 93% within 50 meters) are eliminated by the defaultltail
test. The format of the default for the test is$ltail = c(p20 = 0.5, p50 = 0.90)
. Users may override the default by using, for example,sieve = list(rtail = c(p20 = 0.6, p50 = 0.8))
in the argument list for a situation where it is known that carcasses beyond 50 meters are common. -
$aic
= a numeric scalar cutoff value for model's delta AICc scores. Models with AICc scores exceeding the minimum AICc among all the fitted models bysieve$aic
or more fail the test. The default value is 10. Users may override the default by using, for example,sieve = list(aic = 7)
in the argument list to use a delta AIC score of 7 as the cutoff or may forego the test altogether by settingsieve = list(aic = FALSE)
-
$hin
=TRUE
orFALSE
to test for high influence points, the presence of which cast doubt on the reliability of the model. The function defines "high influence" as models with high leverage points, namely, points with\frac{h}{1 - h} > \frac{2p}{n - 2p}
(whereh
is leverage,p
is the number of parameters in the model, andn
is the search radius) with Cook's distance> 8/(n - 2*p)
. The criteria for high influence points were adapted from Brian Ripley's GLM diagnostics packageboot
(glm.diag
). The test is perhaps most valuable in identifying distributions with high probability of carcasses landing well beyond what could reasonably be expected.
Several choices of pre-defined sieve
s are available (or, as described
above, users may define their own criteria):
sieve = "default"
The models are ordered by the following criteria:
extensibility
weight of right tail (discounting models that predict implausibly high proportions of carcasses beyond the search radius)
weight of the left tail (discounting models that predict implausibly high proportions of carcasses near the turbines)
AICc test (discounting models with delta AICc > 10)
high influence points (discounting models in which one or more of the data points exert a high influence on the fitted model, according to Ripley's GLM diagnostics package
boot
(glm.diag
))ranking by AICc
Precise definitions of the default sieve parameters are given in
sieve_default
.sieve = NULL
Returns a list of the extensible models without scoring them by other model selection criteria.
sieve = "win"
Sorts models by high-influence points and AICc
sieve = list(<custom>)
User provides a custom sieve, which may be a modification of the default sieve or de novo. To modify the default, use, for example,
sieve = list(hin = FALSE)
to disable thehin
test but keep the other default tests, orsieve = list(aic = 7)
to use 7 rather than 10 as the AIC cutoff, orsieve = list(ltail = c(p20 = 0.3, p50 = 0.8))
to use a more stringent left tail test that requires CDF graphs to pass below the points (20, 0.3) and (50, 0.8). Customltail
andrtail
parameters must match the formats of the default tests, but their probabilities may vary. To turn off theaic
filter, usesieve = list(aic = Inf)
. To turn off theltail
filter, usesieve = list(ltail = c(p20 = 1, p50 = 1))
. To turn off thertail
filter, usesieve = list(rtail = c(p80 = 0, p120 = 0, p150 = 0, p200 = 0))
. These custom components may be mixed and matched as desired.
Value
An fmod
object, which is an unordered list of extensible models if
sieve = NULL
; otherwise, a list of class fmod
with following
components:
$filtered
the selected
dd
object or addArray
list of models that passed the tests$scores
a matrix with all models tested (rownames = model names) and the results of each test (columns
aic_test
,rtail
,ltail
,hin
,aic
)$sieve
the test criteria, stored in a list with
-
$aic_test
= cutoff for AIC -
$hin
= boolean to indicate whether high influence points were considered -
$rtail
= numeric vector giving the probabilities that the right tail of the distribution must exceed at distances of 80, 120, 150, and 200 meters in order to pass -
$ltail
= numeric vector giving the probabilities that the left tail of the distribution must NOT exceed at distances of 20 and 50 meters in order to pass
-
models
a list (
ddArray
object) of all models testednote
notes on the tests
When a fmod
object is printed, only a small subset of the elements are
shown. To see a full list of the objects, use names(x)
, where x
is the name of the fmod
return value. The elements
can be extracted in the usual R way via, for example, x$sieve
or
x[["sieve"]]
.
Examples
data(layout_simple)
data(carcass_simple)
sitedata <- initLayout(layout_simple)
ringdata <- prepRing(sitedata)
ringsWithCarcasses <- addCarcass(carcass_simple, data_ring = ringdata)
distanceModels <- ddFit(ringsWithCarcasses)
stats(distanceModels)
stats(distanceModels[["tnormal"]])
stats(distanceModels[["lognormal"]])