FPM {RFPM} | R Documentation |
Floating Percentile Model
Description
Generate sediment quality benchmarks using the floating percentile model algorithm
Usage
FPM(
data,
paramList,
FN_crit = 0.2,
paramFixed = NULL,
paramOverride = FALSE,
increment = 10,
precision = 0.1,
empirical = TRUE,
defIter = 5,
seed = 1,
densInfo = FALSE,
lockInfo = FALSE,
hitInfo = FALSE,
...
)
Arguments
data |
data.frame containing, at a minimum, chemical concentrations as columns and a logical |
paramList |
character vector of column names of chemical concentration variables in |
FN_crit |
numeric vector of values between 0 and 1 indicating false negative threshold(s) for benchmark selection (default = |
paramFixed |
character vector of column names of chemical concentration variables to retain, bypassing testing for specific chemicals (default = |
paramOverride |
logical; whether to retain every chemical variable in |
increment |
numeric value greater than 1; number of increments to evaluate (default = |
precision |
numeric value between 0 and 1 (default = |
empirical |
logical; whether to return the highest empirical value meeting acceptable conditions of the FPM algorithm (default = |
defIter |
numeric value greater than 0; default number of iterations to use in the case of negative or zero values in |
seed |
random seed to set for reproducible results; only for handling edge cases of ranking ties (default = |
densInfo |
logical; whether to return the "density" statistic defining how much FPM criteria changed within the algorithm (default = |
lockInfo |
logical; whether to return the reason for and order in which benchmarks were "locked" within the model algorithm (default = |
hitInfo |
logical; whether to return the predicted Hit results as part of the output (default = |
... |
additional argument passed to |
Details
FPM
is the main function provided in 'RFPM', which was developed firstly as a redevelopment of the Washington Department of Ecology's Excel-based
floating percentile model tool (Avocet 2003; Ecology 2011), and secondly as a means to evaluate uncertainties and sensitivities associated with the model. FPM
generates
sediment quality benchmarks for chemicals with significantly higher concentrations among Hit
samples (meaning they were determined to be categorically toxic).
FPM
is an algorithmic approach to setting sediment quality benchmarks using sediment chemistry data and toxicity test results.
Toxicity is treated as a binary classification - either a Hit == TRUE
or Hit == FALSE
(meaning toxic or non-toxic) by some user-defined definition.
The most important input to FPM
apart from the empirical data is FN_crit
, which determines an upper limit for false negative errors associated with floating percentile model benchmarks.
The default FN_crit
recommended by the Department of Ecology is 0.2
; though intended to be protective, the value of 0.2
is arbitrary. We recommend
that the user run the optimFPM
and/or cvFPM
functions to find the FN_crit
value(s) that optimize benchmark performance within an acceptable error range for the site.
optimFPM
can also help users optimize the alpha
parameter (see ?chemSig
), which is also somewhat arbitrarily set at a conventional default of 0.05
.
There are two arguments that have defaults in FPM
that the user may desire to change in certain circumstances, but that we
generally recommend not changing without good reason. These are paramFixed
and paramOverride
, which override the chemical
selection process, resulting in potentially non-toxic chemicals being assigned benchmarks.
The paramFixed
argument, which only forces named chemicals into the model algorithm, is looser than paramOverride
, which forces all chemicals in paramList into the model algorithm.
See ?chemSig
for more information regarding default parameters used within FPM
.
Even if chemical names are supplied to paramFixed
, FPM
will still use hypothesis testing methods to consider all other chemicals for inclusion.
increment
determines (inversely) how large or small values should be that are added to percentile values in the model algorithm.
A larger increment
results in smaller incremental additions and vice-versa. The WA Department of Ecology recommends a default of
increment = 10
. This is a reasonable value, and we recommend not decreasing increment
below 10
.
Increasing increment
will increase computation time, and may or may not result in more accurate benchmarks. So, we recommend not
increasing increment
much higher than 10
.
precision
determines how many iterative loops will be attempted within the model algorithm when trying to increase each benchmark. If increasing the
benchmark would increase the false negative rate above FN_crit
, the benchmark would then be decreased, the increment size is divided by increment
, and
then the smaller incremental addition is used to increase the benchmark. This process repeats for a fixed number of iterations, which is related to precision
.
If the benchmark cannot be increased after the fixed number of iterations, the benchmark is locked in place.
The default value for precision
is 0.1
, but the value could be lower, if desired. Lowering the value will increase computation time and may or may not
result in more accurate benchmarks. In general, we recommend reducing precision
rather than increasing increment
in order to potentially enhance the
precision of benchmark calculations.
empirical
by default returns empirical concentrations from data
that meets the conditions of the FPM. The user can set this argument FALSE
if an exact FPM calculation is desired. The exact calculation will still meet the FPM requirements.
The hitInfo
argument allows the user to export the Hit predictions (FPM_Hit
) for data
based on the calculated FPM criteria as well as the associated FN/FP/TP/TN class
.
The lockInfo
argument allows the user to export information about what caused the model algorithm to lock for each
chemical. Output options are: "FN"
for exceeding the false negative limit (i.e., FN_crit
), "FP"
if the number of false positives was reduced to zero,
"Max"
if the empirical maximum concentration was exceeded, or Mix
if more than one of the first three options occurred.
The following classification statistics are reported alongside the generated benchmarks:
TP
, FN
, TN
, and FP
- the numbers of true positive, false negative, true negative, and false positive predictions
pFN
and pFP
- proportions of false predictions (false No-hit and false Hit, respectively)
sens
- sensitivity; the probability of detecting a Hit
spec
- specificity; the probability of detecting a No-hit
OR
- overall reliability; the probability of making a correct prediction (Hit or No-hit)
FM
- Fowlkes-Mallows Index; geometric mean of sensitivity and the positive predictive rate
MCC
- Matthew's Correlation Coefficient; metric analogous to Pearson's coefficient, but instead defining correspondance between categorical predictions and reality (rather than for continuous data).
The second output of FPM
is a metric called chemDensity
. This is a measure of how much
the percentile "floated" in the algorithm from the starting position up to the chemical's value at which it was locked in place.
Values of chemDensity
closer to 1 floated less and vice-versa. By floating less, this indicates that
even small changes in the chemical concentration resulted in one of the acceptance criteria failing (as discussed above with regard to lockInfo
). When comparing
the chemDensity
among chemicals, those with lower values might be viewed as having less of an influence on toxicity predictions
and vice-versa. For those interested in understanding the relative importance of chemicals among benchmarks, we recommend using chemVI
and considering the MADP
and dOR
outputs.
Value
list of 2 or 4 objects (depending on lockInfo
):
Benchmarks and toxicity classification error statistics;
order in which benchmarks were locked in place;
reason for benchmarks being locked in place; and
-
chemDensity
statistic
References
Avocet. 2003. Development of freshwater sediment quality values for use in Washington State. Phase II report: Development and recommendation of SQVs for freshwater sediments in Washington State. Publication No. 03-09-088. Prepared for Washington Department of Ecology. Avocet Consulting, Kenmore, WA. Ecology. 2011. Development of benthic SQVs for freshwater sediments in Washington, Oregon, and Idaho. Publication no. 11-09-054. Toxics Cleanup Program, Washington State Department of Ecology, Olympia, WA.
See Also
optimFPM, cvFPM, chemSig, chemSigSelect, chemVI
Examples
paramList = c("Cd", "Cu", "Fe", "Mn", "Ni", "Pb", "Zn")
FPM(h.tristate, paramList, ExcelMode = TRUE, warn = FALSE)
FPM(h.tristate, paramList, c(0.1, 0.2, 0.3))