R: Floating Percentile Model

FPM {RFPM}

R Documentation

Floating Percentile Model

Description

Generate sediment quality benchmarks using the floating percentile model algorithm

Usage

FPM(
  data,
  paramList,
  FN_crit = 0.2,
  paramFixed = NULL,
  paramOverride = FALSE,
  increment = 10,
  precision = 0.1,
  empirical = TRUE,
  defIter = 5,
  seed = 1,
  densInfo = FALSE,
  lockInfo = FALSE,
  hitInfo = FALSE,
  ...
)

Arguments

`data`	data.frame containing, at a minimum, chemical concentrations as columns and a logical `Hit` column classifying toxicity
`paramList`	character vector of column names of chemical concentration variables in `data`
`FN_crit`	numeric vector of values between 0 and 1 indicating false negative threshold(s) for benchmark selection (default = `0.2`)
`paramFixed`	character vector of column names of chemical concentration variables to retain, bypassing testing for specific chemicals (default = `NULL`). See Details.
`paramOverride`	logical; whether to retain every chemical variable in `paramList` (default = `FALSE`). See Details.
`increment`	numeric value greater than 1; number of increments to evaluate (default = `10`). See Details.
`precision`	numeric value between 0 and 1 (default = `0.1`)
`empirical`	logical; whether to return the highest empirical value meeting acceptable conditions of the FPM algorithm (default = `TRUE`)
`defIter`	numeric value greater than 0; default number of iterations to use in the case of negative or zero values in `data` (default = `5`)
`seed`	random seed to set for reproducible results; only for handling edge cases of ranking ties (default = `1`)
`densInfo`	logical; whether to return the "density" statistic defining how much FPM criteria changed within the algorithm (default = `FALSE`)
`lockInfo`	logical; whether to return the reason for and order in which benchmarks were "locked" within the model algorithm (default = `FALSE`). See Details.
`hitInfo`	logical; whether to return the predicted Hit results as part of the output (default = `FALSE`)
`...`	additional argument passed to `chemSigSelect` and `chemSig`

Details

FPM is the main function provided in 'RFPM', which was developed firstly as a redevelopment of the Washington Department of Ecology's Excel-based floating percentile model tool (Avocet 2003; Ecology 2011), and secondly as a means to evaluate uncertainties and sensitivities associated with the model. FPM generates sediment quality benchmarks for chemicals with significantly higher concentrations among Hit samples (meaning they were determined to be categorically toxic).

FPM is an algorithmic approach to setting sediment quality benchmarks using sediment chemistry data and toxicity test results. Toxicity is treated as a binary classification - either a Hit == TRUE or Hit == FALSE (meaning toxic or non-toxic) by some user-defined definition. The most important input to FPM apart from the empirical data is FN_crit, which determines an upper limit for false negative errors associated with floating percentile model benchmarks. The default FN_crit recommended by the Department of Ecology is 0.2; though intended to be protective, the value of 0.2 is arbitrary. We recommend that the user run the optimFPM and/or cvFPM functions to find the FN_crit value(s) that optimize benchmark performance within an acceptable error range for the site. optimFPM can also help users optimize the alpha parameter (see ?chemSig), which is also somewhat arbitrarily set at a conventional default of 0.05.

There are two arguments that have defaults in FPM that the user may desire to change in certain circumstances, but that we generally recommend not changing without good reason. These are paramFixed and paramOverride, which override the chemical selection process, resulting in potentially non-toxic chemicals being assigned benchmarks. The paramFixed argument, which only forces named chemicals into the model algorithm, is looser than paramOverride, which forces all chemicals in paramList into the model algorithm. See ?chemSig for more information regarding default parameters used within FPM. Even if chemical names are supplied to paramFixed, FPM will still use hypothesis testing methods to consider all other chemicals for inclusion.

increment determines (inversely) how large or small values should be that are added to percentile values in the model algorithm. A larger increment results in smaller incremental additions and vice-versa. The WA Department of Ecology recommends a default of increment = 10. This is a reasonable value, and we recommend not decreasing increment below 10. Increasing increment will increase computation time, and may or may not result in more accurate benchmarks. So, we recommend not increasing increment much higher than 10.

precision determines how many iterative loops will be attempted within the model algorithm when trying to increase each benchmark. If increasing the benchmark would increase the false negative rate above FN_crit, the benchmark would then be decreased, the increment size is divided by increment, and then the smaller incremental addition is used to increase the benchmark. This process repeats for a fixed number of iterations, which is related to precision. If the benchmark cannot be increased after the fixed number of iterations, the benchmark is locked in place. The default value for precision is 0.1, but the value could be lower, if desired. Lowering the value will increase computation time and may or may not result in more accurate benchmarks. In general, we recommend reducing precision rather than increasing increment in order to potentially enhance the precision of benchmark calculations.

empirical by default returns empirical concentrations from data that meets the conditions of the FPM. The user can set this argument FALSE if an exact FPM calculation is desired. The exact calculation will still meet the FPM requirements.

The hitInfo argument allows the user to export the Hit predictions (FPM_Hit) for data based on the calculated FPM criteria as well as the associated FN/FP/TP/TN class.

The lockInfo argument allows the user to export information about what caused the model algorithm to lock for each chemical. Output options are: "FN" for exceeding the false negative limit (i.e., FN_crit), "FP" if the number of false positives was reduced to zero, "Max" if the empirical maximum concentration was exceeded, or Mix if more than one of the first three options occurred.

The following classification statistics are reported alongside the generated benchmarks: TP, FN, TN, and FP - the numbers of true positive, false negative, true negative, and false positive predictions pFN and pFP - proportions of false predictions (false No-hit and false Hit, respectively) sens - sensitivity; the probability of detecting a Hit spec - specificity; the probability of detecting a No-hit OR - overall reliability; the probability of making a correct prediction (Hit or No-hit) FM - Fowlkes-Mallows Index; geometric mean of sensitivity and the positive predictive rate MCC - Matthew's Correlation Coefficient; metric analogous to Pearson's coefficient, but instead defining correspondance between categorical predictions and reality (rather than for continuous data).

The second output of FPM is a metric called chemDensity. This is a measure of how much the percentile "floated" in the algorithm from the starting position up to the chemical's value at which it was locked in place. Values of chemDensity closer to 1 floated less and vice-versa. By floating less, this indicates that even small changes in the chemical concentration resulted in one of the acceptance criteria failing (as discussed above with regard to lockInfo). When comparing the chemDensity among chemicals, those with lower values might be viewed as having less of an influence on toxicity predictions and vice-versa. For those interested in understanding the relative importance of chemicals among benchmarks, we recommend using chemVI and considering the MADP and dOR outputs.

Value

list of 2 or 4 objects (depending on lockInfo):

Benchmarks and toxicity classification error statistics;
order in which benchmarks were locked in place;
reason for benchmarks being locked in place; and
chemDensity statistic

References

Avocet. 2003. Development of freshwater sediment quality values for use in Washington State. Phase II report: Development and recommendation of SQVs for freshwater sediments in Washington State. Publication No. 03-09-088. Prepared for Washington Department of Ecology. Avocet Consulting, Kenmore, WA. Ecology. 2011. Development of benthic SQVs for freshwater sediments in Washington, Oregon, and Idaho. Publication no. 11-09-054. Toxics Cleanup Program, Washington State Department of Ecology, Olympia, WA.

Examples

paramList = c("Cd", "Cu", "Fe", "Mn", "Ni", "Pb", "Zn")
FPM(h.tristate, paramList, ExcelMode = TRUE, warn = FALSE)
FPM(h.tristate, paramList, c(0.1, 0.2, 0.3))

[Package RFPM version 1.1 Index]