optimFPM {RFPM}R Documentation

Optimization of Floating Percentile Model Parameters

Description

Calculate parameter inputs that optimize benchmark performance

Usage

optimFPM(
  data,
  paramList,
  FN_crit = seq(0.1, 0.9, by = 0.05),
  alpha.test = seq(0.05, 0.5, by = 0.05),
  which = c(1, 2, 3, 4),
  simplify = TRUE,
  plot = TRUE,
  colors = heat.colors(10),
  colsteps = 100,
  ...
)

Arguments

data

data.frame containing, at a minimum, chemical concentrations as columns and a logical Hit column classifying toxicity

paramList

character vector of column names of chemical concentration variables in data

FN_crit

numeric vector over which to optimize false negative thresholds (default = seq(0.1, 0.9, by = 0.05))

alpha.test

numeric vector of type-I error rate values over which to optimize (default = seq(0.05, 0.5, by = 0.05))

which

numeric or character indicating which type of plot to generate (see Details; default = c(1, 2))

simplify

logical; whether to generate simplified output (default = TRUE)

plot

logical; whether to generate a plot to visualize the opimization results

colors

values recognizible as colors to be passed to colorRampPalette (via colorGradient) to generate a palette for plotting (default = heat.colors(10))

colsteps

integer; number of discrete steps to interpolate colors in colorGradient (default = 100)

...

additional argument passed to FPM, chemSig, chemSigSelect, and colorGradient

Details

optimFPM was designed to help optimize the predictive capacity of the benchmarks generated by FPM. The default input parameters to FPM (i.e., FN_crit = 0.2 and alpha.test = 0.05) are arbitrary, and optimization can help to objectively establish more accurate benchmarks. Graphical output from optimFPM can also help users to understand the relationship(s) between benchmark accuracy/error, FN_crit, and alpha.test.

Default inputs for FN_crit and alpha.test were selected to represent a reasonable range of values to test. Testing over both ranges will result in a two-way optimization, which can be computationally intensive. Alternatively, optimFPM can be run for one parameter at a time by specifying a single value for FN_crit or alpha.test. Note that inputting single values for both FN_crit and alpha.test will generate unhelpful results.

Several metrics are used for optimization:

  1. Ratio of sensitivity/specificity ("sensSpecRatio"), calculated as the minimum of the two metrics divided by the maximum of the two. Therefore, this value will always be between 0 and 1, representing the balance between correct Hit==TRUE and Hit==FALSE predictions.

  2. Overall reliability ("OR") (i.e., probability of correctly predicting Hit values)

  3. Fowlkes-Mallows Index ("FM") - an average of metrics focusing on predicting Hit==TRUE

  4. Matthew's Correlation Coefficient ("MCC") - a measure of the correspondence between the data and predictions analogous to a Pearson's correlation coefficient (but for binary data)

Graphical output will differ depending on whether or not a single value is input for FN_crit or alpha.test. Providing a single value for one of the two arguments will generate a line graph, whereas providing longer vectors (i.e., length > 1) of inputs for both arguments will generate dot matrix plots using colors to generate a color palette and colsteps to define the granularity of the color gradient with the palette. The order of colors will be plotted from more optimal to less optimal; for example, the default of heat.colors(10) will show optimal colors as red and less optimal colors as yellower. By default, multiple plots will be generated, however the which argument can control which plots are generated. Inputs to which are, by default, c(1, 2, 3, 4) for the metrics noted above, and flexible character inputs also can be used to a degree. Black squares indicate the optimal argument inputs; these values are also printed to the console and can be assigned to an object.

Value

data.frame of optimized FN_crit and/or alpha.test values

See Also

FPM, colorGradient, colorRampPalette

Examples

paramList = c("Cd", "Cu", "Fe", "Mn", "Ni", "Pb", "Zn")
FN_seq <- seq(0.1, 0.3, 0.05)
alpha_seq <- seq(0.05, 0.2, 0.05)
optimFPM(h.tristate, paramList, FN_seq, 0.05)
optimFPM(h.tristate, paramList, 0.2, alpha_seq)
optimFPM(h.tristate, paramList, FN_seq, alpha_seq, which=2)

[Package RFPM version 1.1 Index]