R: Automatically determine most linear, highest, lowest and...

auto_rate {respR}

R Documentation

Automatically determine most linear, highest, lowest and rolling oxygen uptake or production rates

Description

auto_rate performs rolling regressions on a dataset to determine the most linear, highest, lowest, maximum, minimum, rolling, and interval rates of change in oxygen against time. A rolling regression of the specified width is performed on the entire dataset, then based on the "method" input, the resulting regressions are ranked or ordered, and the output summarised.

Usage

auto_rate(x, method = "linear", width = NULL, by = "row", plot = TRUE, ...)

Arguments

`x`	data frame, or object of class `inspect` containing oxygen~time data.
`method`	string. `"linear"`, `"highest"`, `"lowest"`, `"maximum"`, `"minimum"`, `"rolling"` or `"interval"`. Defaults to `"linear"`. See Details.
`width`	numeric. Width of the rolling regression. For `by = "row"`, either a value between 0 and 1 representing a proportion of the data length, or an integer of 2 or greater representing an exact number of rows. If `by = "time"` it represents a time window in the units of the time data. If `NULL`, it defaults to 0.2 or a window of 20% of the data length. See Details.
`by`	string. `"row"` or `"time"`. Defaults to `"row"`. Metric by which to apply the `width` input if it is above 1.
`plot`	logical. Defaults to TRUE. Plot the results.
`...`	Allows additional plotting controls to be passed, such as `pos`, `panel`, and `quiet = TRUE`.

Details

Ranking and ordering algorithms

Currently, auto_rate contains seven ranking and ordering algorithms that can be applied using the method input:

linear: Uses kernel density estimation (KDE) to learn the shape of the entire dataset and automatically identify the most linear regions of the timeseries. This is achieved by using the smoothing bandwidth of the KDE to re-sample the "peaks" in the KDE to determine linear regions of the data. The summary output will contain only the regressions identified as coming from linear regions of the data, ranked by order of the KDE density analysis. This is present in the ⁠$summary⁠ component of the output as ⁠$density⁠. Under this method, the width input is used as a starting seed value, but the resulting regressions may be of any width. See here for full details.
highest: Every regression of the specified width across the entire timeseries is calculated, then ordered using absolute rate values from highest to lowest. Essentially, this option ignores the sign of the rate, and can only be used when rates all have the same sign. Rates will be ordered from highest to lowest in the ⁠$summary⁠ table regardless of if they are oxygen uptake or oxygen production rates.
lowest: Every regression of the specified width across the entire timeseries is calculated, then ordered using absolute rate values from lowest to highest. Essentially, this option ignores the sign of the rate, and can only be used when rates all have the same sign. Rates will be ordered from lowest to highest in the ⁠$summary⁠ table regardless of if they are oxygen uptake or oxygen production rates.
maximum: Every regression of the specified width across the entire timeseries is calculated, then ordered using numerical rate values from maximum to minimum. Takes full account of the sign of the rate. Therefore, oxygen uptake rates, which in respR are negative, would be ordered from lowest (least negative), to highest (most negative) in the summary table in numerical order. Therefore, generally this method should only be used when rates are a mix of oxygen consumption and production rates, such as when positive rates may result from regressions fit over flush periods in intermittent-flow respirometry. Generally, for most analyses where maximum or minimum rates are of interest the "highest" or "lowest" methods should be used.
minimum: Every regression of the specified width across the entire timeseries is calculated, then ordered using numerical rate values from minimum to maximum. Takes full account of the sign of the rate. Therefore, oxygen uptake rates, which in respR are negative, would be ordered from highest (most negative) to lowest (least negative) in the summary table in numerical order. Therefore, generally this method should only be used when rates are a mix of oxygen consumption and production rates, such as when positive rates may result from regressions fit over flush periods in intermittent-flow respirometry. Generally, for most analyses where maximum or minimum rates are of interest the "highest" or "lowest" methods should be used.
rolling: A rolling regression of the specified width is performed across the entire timeseries. No reordering of results is performed.
interval: multiple, successive, non-overlapping regressions of the specified width are extracted from the rolling regressions, ordered by time.

Further selection and filtering of results

For further selection or subsetting of auto_rate results, see the dedicated select_rate() function, which allows subsetting of rates by various criteria, including r-squared, data region, percentiles, and more.

Units

There are no units involved in auto_rate. This is a deliberate decision. The units of oxygen concentration and time will be specified later in convert_rate() when rates are converted to specific output units.

The `width` and `by` inputs

If by = "time", the width input represents a time window in the units of the time data in x.

If by = "row" and width is between 0 and 1 it represents a proportion of the total data length, as in the equation ⁠floor(width * number of data rows)⁠. For example, 0.2 represents a rolling window of 20% of the data width. Otherwise, if entered as an integer of 2 or greater, the width represents the number of rows.

For both by inputs, if left as width = NULL it defaults to 0.2 or a window of 20% of the data length.

In most cases, by should be left as the default "row", and the width chosen with this in mind, as it is considerably more computationally efficient. Changing to "time" causes the function to perform checks for irregular time intervals at every iteration of the rolling regression, which adds to computation time. This is to ensure the specified width input is honoured in the time units and rates correctly calculated, even if the data is unevenly spaced or has gaps.

Plot

A plot is produced (provided plot = TRUE) showing the original data timeseries of oxygen against time (bottom blue axis) and row index (top red axis), with the rate result region highlighted. Second panel is a close-up of the rate region with linear model coefficients. Third panel is a rolling rate plot (note the reversed y-axis so that higher oxygen uptake rates are plotted higher), of a rolling rate of the input width across the whole dataset. Each rate is plotted against the middle of the time and row range used to calculate it. The dashed line indicates the value of the current rate result plotted in panels 1 and 2. The fourth and fifth panels are summary plots of fit and residuals, and for the linear method the sisth panel the results of the kernel density analysis, with the dashed line again indicating the value of the current rate result plotted in panels 1 and 2.

Additional plotting options

If multiple rates have been calculated, by default the first (pos = 1) is plotted. Others can be plotted by changing the pos input either in the main function call, or by plotting the output, e.g. plot(object, pos = 2). In addition, each sub-panel can be examined individually by using the panel input, e.g. plot(object, panel = 2).

Console output messages can be suppressed using quiet = TRUE. If axis labels or other text boxes obscure parts of the plot they can be suppressed using legend = FALSE. The rate in the rolling rate plot can be plotted not reversed by passing rate.rev = FALSE, for instance when examining oxygen production rates so that higher production rates appear higher. If axis labels (particularly y-axis) are difficult to read, las = 2 can be passed to make axis labels horizontal, and oma (outer margins, default oma = c(0.4, 1, 1.5, 0.4)), and mai (inner margins, default mai = c(0.3, 0.15, 0.35, 0.15)) used to adjust plot margins.

S3 Generic Functions

Saved output objects can be used in the generic S3 functions print(), summary(), and mean().

print(): prints a single result, by default the first rate. Others can be printed by passing the pos input. e.g. print(x, pos = 2)
summary(): prints summary table of all results and metadata, or those specified by the pos input. e.g. summary(x, pos = 1:5). The summary can be exported as a separate data frame by passing export = TRUE.
mean(): calculates the mean of all rates, or those specified by the pos input. e.g. mean(x, pos = 1:5) The mean can be exported as a separate value by passing export = TRUE.

For additional help, documentation, vignettes, and more visit the respR website at https://januarharianto.github.io/respR/

Value

Output is a list object of class auto_rate containing input parameters and data, various summary data, metadata, linear models, and the primary output of interest ⁠$rate⁠, which can be background adjusted in adjust_rate or converted to units in convert_rate.

Examples


# Most linear section of an entire dataset
inspect(sardine.rd, time = 1, oxygen =2) %>%
  auto_rate()

# What is the lowest oxygen consumption rate over a 10 minute (600s) period?
inspect(sardine.rd, time = 1, oxygen =2) %>%
  auto_rate(method = "lowest", width = 600, by = "time") %>%
  summary()

# What is the highest oxygen consumption rate over a 10 minute (600s) period?
inspect(sardine.rd, time = 1, oxygen =2) %>%
  auto_rate(method = "highest", width = 600, by = "time") %>%
  summary()

# What is the NUMERICAL minimum oxygen consumption rate over a 5 minute (300s)
# period in intermittent-flow respirometry data?
# NOTE: because uptake rates are negative, this would actually be
# the HIGHEST uptake rate.
auto_rate(intermittent.rd, method = "minimum", width = 300, by = "time") %>%
  summary()

# What is the NUMERICAL maximum oxygen consumption rate over a 20 minute
# (1200 rows) period in respirometry data in which oxygen is declining?
# NOTE: because uptake rates are negative, this would actually be
# the LOWEST uptake rate.
sardine.rd %>%
  inspect() %>%
  auto_rate(method = "maximum", width = 1200, by = "row") %>%
  summary()

# Perform a rolling regression of 10 minutes width across the entire dataset.
# Results are not ordered under this method.
sardine.rd %>%
  inspect() %>%
  auto_rate(method = "rolling", width = 600, by = "time") %>%
  summary()

[Package respR version 2.3.3 Index]