find_best_fits {forceR}R Documentation

Find Best Polynomial Fits for Curves

Description

Calculates best model fits for all curves based on AIC criterion. The function fits polynomial functions with 1 to 20 coefficients and uses the Akaike Information Criterion (AIC) to evaluate the goodness of the fits. A model is considered a good fit, when the percentage of change from one model to the next (e.g. a model with 6 coefficients to a model with 7 coefficients) is, e.g. ⁠< 5%⁠ when threshold = 5. The first for models meeting this criterion are plotted as colored graphs and the AICs of these models are visualized in a second plot for each curve. All first four coefficients per curve that fulfill the criterion are stored and in the end, a histogram of how often which coefficients were good fits is plotted as well. The function returns the numerical value of the coefficient that fulfilled the criterion of a good fit in most curves.

Usage

find_best_fits(
  df,
  degrees = 1:20,
  threshold = 5,
  zero_threshold = NULL,
  plot.to.screen = FALSE,
  path.data = NULL,
  path.plots = NULL,
  show.progress = FALSE
)

Arguments

df

The resulting tibble of the function avg_peaks(). See below for more details.

degrees

Numerical vector of polynomial degrees to test. Cannot be infinitely high - if two high, throws error: ⁠'degree' must be less than number of unique points⁠. Default: 1:20.

threshold

Percentage of AIC change compared to previous degree to fit the good-fit-criteria (s.a.). Default: 5.

zero_threshold

Either numerical or NULL: If numerical, the function checks if the graph of the current model starts and ends near zero, e.g. below 0.2 if zero_threshold = 0.2. Default: NULL.

plot.to.screen

A logical value indicating if results should be plotted in the current R plot device. Default: FALSE.

path.data

A string character defining where to save the results. If NULL, data is not stored in a file. Default: NULL.

path.plots

A string character defining where to save the plots. If NULL, plots will not be saved to PDF files. Default: NULL.

show.progress

A logical value indicating if progress should be printed to the console. Default: FALSE.

Details

#' This function expects a tibble made of three columns as df: species containing the species names, index numerical column, e.g. time (but can be arbitrary continuous unit), for each species, and force.norm.100 containing the averaged and rescaled curve of each species.

Value

Returns the a numerical value representing the number of coefficient that was most often under the first 4 models that were followed by an AIC-change ⁠<= 5%⁠ by the next model. Additionally, plots showing the model fits and a histogram of the coefficients that met the 5%-criterion can be plotted to the plot device or saved as PDFs in path.plots.

Examples

# Using the forceR::peaks.df.100.avg dataset:

# find smallest polynomial degree that best describes all curves
best.fit.poly <- find_best_fits(df = forceR::peaks.df.100.avg)

best.fit.poly


[Package forceR version 1.0.20 Index]