R: Estuarine Bayesian Single-station Estimation method for...

ebase {EBASE}

R Documentation

Estuarine Bayesian Single-station Estimation method for ecosystem metabolism

Description

Estuarine Bayesian Single-station Estimation method for ecosystem metabolism

Usage

ebase(
  dat,
  Z,
  interval,
  ndays = 1,
  aprior = c(4, 2),
  rprior = c(300, 150),
  bprior = c(0.251, 0.125),
  bmax = 0.502,
  doave = TRUE,
  maxinterp = 43200/interval,
  n.iter = 10000,
  update.chains = TRUE,
  n.burnin = n.iter * 0.5,
  n.chains = 3,
  n.thin = 10,
  progress = NULL,
  model_file = NULL
)

Arguments

`dat`	input data frame
`Z`	numeric as single value for water column depth (m) or vector equal in length to number of rows in `dat`
`interval`	timestep interval in seconds
`ndays`	numeric for number of days in `dat` for optimizing the metabolic equation, see details
`aprior`	numeric vector of length two indicating the mean and standard deviation for the prior distribution of the a parameter, see details
`rprior`	numeric vector of length two indicating the mean and standard deviation for the prior distribution of the R parameter, see details
`bprior`	numeric vector of length two indicating the mean and standard deviation for the prior distribution of the b parameter, see details
`bmax`	numeric value for the upper limit on the prior distribution for `bprior`, set as twice the default value of the mean
`doave`	logical indicating if the average dissolved oxygen concentration is used as the starting value for the estimation (default), otherwise the first observation will be used if `FALSE`, see details
`maxinterp`	numeric value for minimum number of continuous observations that must not be interpolated within a group defined by `ndays` to assign as `NA` in output, see details
`n.iter`	number of MCMC iterations, passed to `jags`
`update.chains`	logical to run `metab_update` if chains do not converge
`n.burnin`	number of MCMC chains to delete, passed to `jags`
`n.chains`	number of MCMC chains to run, passed to `jags`
`n.thin`	number of nth iterations to save for each chain, passed to `jags`
`progress`	character string of path where progress is saved to as 'log.txt', use `NULL` to suppress (default)
`model_file`	`NULL` to use the model file included with the package or a path to a model text file can be used

Details

Required input data are time series for dissolved oxygen (mg L-1), water temperature (C), salinity (psu), total PAR (W m-2), and wind speed (m s-1). See the exdat example data file for a representation of the required data. Data are typically from continuously monitored water quality and weather parameters are hourly of sub-hourly time steps. Oxygen concentrations are converted to mmol/m3 prior to metabolic estimation. Water column depth is also required. This can be supplied as a single value or a vector of length equal to the number of rows in dat.

The metabolic estimates are based on a mass balance equation in Grace et al. 2015 with the gas exchange estimate from Wanninkhof 2004. It is similar to that provided by the BASEmetab R package at https://github.com/dgiling/BASEmetab, with modifications to estimate different parameters. The equation optimized in the JAGS model is:

Z\frac{dC_d}{dt} = aPAR - R + bU_{10}^2\left(\frac{Sc}{600} \right)^{-0.5} \left(C_{Sat} - C_d \right )

More simply:

Z\frac{dC_d}{dt} = P - R + D

Gross production is provided by aPAR, respiration is provided by R, and gas exchange is provided by the remainder. The likelihood of the parameters a, R, and b given the observed data are estimated from the JAGS model using prior distributions shown in the model file. At each time step, the change in oxygen concentration between time steps is calculated from the equation using model inputs and parameter guesses, and then a finite difference approximation is used to estimate modeled oxygen concentration. The first modeled value starts at the mean oxygen concentration for all measurements in the optimization period. The estimated concentration at each time step is also returned for comparison to observed as one measure of model performance.

The prior distributions for the a, R, and b parameters are defined in the model file included with the package as normal distributions with mean and standard deviations provided by the aprior, rprior, and bprior arguments. The default values were chosen based on approximate values from national syntheses of metabolic estimates. An additional constraint is that all the prior distributions are truncated to be positive values as required by the core metabolism equation above. The upper limit for b is set as two times 0.251, as given in eqn. 4 in Wanninkhof 2014. Units for each parameter are (mmol m-2 d-1)/(W m-2) for a, mmol m-2 d-1 for R, and (cm hr-1)/(m2 s-2) for b.

The ndays argument defines the model optimization period as the number of days that are used for optimizing the above mass balance equation. By default, this is done each day, i.e., ndays= 1 such that a loop is used that applies the model equation to observations within each day, evaluated iteratively from the first observation in a day to the last. Individual parameter estimates for a, R, and b are then returned for each day. However, more days can be used to estimate the unknown parameters, such that the loop can be evaluated for every ndays specified by the argument. The ndays argument will separate the input data into groups of consecutive days, where each group has a total number of days equal to ndays. The final block may not include the complete number of days specified by ndays if the number of unique dates in the input data includes a remainder when divided by ndays, e.g., if seven days are in the input data and ndays = 5, there will be two groups where the first has five days and the second has two days. The output data from ebase includes a column that specifies the grouping that was used based on ndays.

Missing values in the input data are also interpolated prior to estimating metabolism. It is the responsibility of the user to verify that these interpolated values are not wildly inaccurate. Missing values are linearly interpolated between non-missing values at the time step specified by the value in interval. This works well for small gaps, but can easily create inaccurate values at gaps larger than a few hours. The interp_plot function can be used to visually assess the interpolated values. Records at the start or end of the input time series that do not include a full day are also removed. A warning is returned to the console if gaps are found or dangling records are found.

The maxinterp argument specifies a minimum number of observations that must not be interpolated within groups defined by ndays that are assigned NA in the output (except Date and DateTimeStamp). Groups with continuous rows of interpolated values with length longer than this argument are assigned NA. The default value is half a day, i.e., 43200 seconds divided by the value in interval.

The doave argument can be used to define which dissolved oxygen value is used as the starting point in the Bayesian estimation for the optimization period. The default setting (doave = TRUE) will use the average of all the dissolved oxygen values in the optimization period defined by ndays. For example, the average of all dissolved oxygen values in each 24 hour period will be used if doave = TRUE and ndays = 1. The first dissolved oxygen observation of the time series in the optimization period will be used as the starting point if doave = F. In this case, the simulated dissolved oxygen time series will always start at the first observed dissolved oxygen value for each optimization period.

Value

A data frame with metabolic estimates for areal gross production (P, O2 mmol m-2 d-1), respiration (R, O2 mmol m-2 d-1), and gas exchange (D, O2 mmol m-2 d-1, positive values as ingassing, negative values as outgassing). Additional parameters estimated by the model that are returned include a and b. The a parameter is a constant that represents the primary production per quantum of light with units of (mmol m-2 d-1)/(W m-2) and is used to estimate gross production (Grace et al., 2015). The b parameter is a constant used to estimate gas exchange in (cm hr-1)/(m2 s-2) (provided as 0.251 in eqn. 4 in Wanninkhof 2014). Observed dissolved oxygen (DO_obs, mmol m-3), modeled dissolved oxygen (DO_mod, mmol m-3), and delta dissolved oxygen of the modeled results (dDO, mmol m-3 d-1) are also returned. Note that delta dissolved oxygen is a daily rate.

95% credible intervals for a, b, and R are also returned in the corresponding columns alo, ahi, blo, bhi, Rlo, and Rhi, for the 2.5th and 97.5th percentile estimates for each parameter, respectively. These values indicate the interval within which there is a 95% probability that the true parameter is in this range. Note that all values for these parameters are repeated across rows, although only one estimate for each is returned based on the number of days defined by ndays.

Model fit can also be assessed using the converge and rsq columns. The values in these columns apply to each group in the grp column as specified with the ndays argument. The converge column indicates "Check convergence" or "Fine" if the JAGS estimate converged at that iteration (repeated across rows for the group). The n.chains argument can be increased if convergence is not achieved. Similarly, the rsq column shows the r-squared values of the linear fit between the modeled and observed dissolved oxygen (repeated across rows for the group). These values can also be viewed with fit_plot.

References

Grace, M.R., Giling, D.P., Hladyz, S., Caron, V., Thompson, R.M., Nally, R.M., 2015. Fast processing of diel oxygen curves: Estimating stream metabolism with BASE (BAyesian Single-station Estimation). Limnology and Oceanography: Methods 13, e10011. https://doi.org/10.1002/lom3.10011

Wanninkhof, R., 2014. Relationship between wind speed and gas exchange over the ocean revisited. Limnology and Oceanography: Methods 12, 351–362. https://doi.org/10.4319/lom.2014.12.351

Examples

# get one day of data
dat <- exdat[as.Date(exdat$DateTimeStamp, tz = 'America/Jamaica') == as.Date('2012-06-01'), ]

# run ebase, use more chains and iterations for a better fit, update.chains as T
ebase(dat, interval = 900, Z = 1.85, n.chains = 2, n.iter = 50, 
 update.chains = FALSE)

[Package EBASE version 1.0.2 Index]