ebase {EBASE} | R Documentation |
Estuarine Bayesian Single-station Estimation method for ecosystem metabolism
Description
Estuarine Bayesian Single-station Estimation method for ecosystem metabolism
Usage
ebase(
dat,
Z,
interval,
ndays = 1,
aprior = c(4, 2),
rprior = c(300, 150),
bprior = c(0.251, 0.125),
bmax = 0.502,
doave = TRUE,
maxinterp = 43200/interval,
n.iter = 10000,
update.chains = TRUE,
n.burnin = n.iter * 0.5,
n.chains = 3,
n.thin = 10,
progress = NULL,
model_file = NULL
)
Arguments
dat |
input data frame |
Z |
numeric as single value for water column depth (m) or vector equal in length to number of rows in |
interval |
timestep interval in seconds |
ndays |
numeric for number of days in |
aprior |
numeric vector of length two indicating the mean and standard deviation for the prior distribution of the a parameter, see details |
rprior |
numeric vector of length two indicating the mean and standard deviation for the prior distribution of the R parameter, see details |
bprior |
numeric vector of length two indicating the mean and standard deviation for the prior distribution of the b parameter, see details |
bmax |
numeric value for the upper limit on the prior distribution for |
doave |
logical indicating if the average dissolved oxygen concentration is used as the starting value for the estimation (default), otherwise the first observation will be used if |
maxinterp |
numeric value for minimum number of continuous observations that must not be interpolated within a group defined by |
n.iter |
number of MCMC iterations, passed to |
update.chains |
logical to run |
n.burnin |
number of MCMC chains to delete, passed to |
n.chains |
number of MCMC chains to run, passed to |
n.thin |
number of nth iterations to save for each chain, passed to |
progress |
character string of path where progress is saved to as 'log.txt', use |
model_file |
|
Details
Required input data are time series for dissolved oxygen (mg L-1), water temperature (C), salinity (psu), total PAR (W m-2), and wind speed (m s-1). See the exdat
example data file for a representation of the required data. Data are typically from continuously monitored water quality and weather parameters are hourly of sub-hourly time steps. Oxygen concentrations are converted to mmol/m3 prior to metabolic estimation. Water column depth is also required. This can be supplied as a single value or a vector of length equal to the number of rows in dat
.
The metabolic estimates are based on a mass balance equation in Grace et al. 2015 with the gas exchange estimate from Wanninkhof 2004. It is similar to that provided by the BASEmetab R package at https://github.com/dgiling/BASEmetab, with modifications to estimate different parameters. The equation optimized in the JAGS model is:
Z\frac{dC_d}{dt} = aPAR - R + bU_{10}^2\left(\frac{Sc}{600} \right)^{-0.5} \left(C_{Sat} - C_d \right )
More simply:
Z\frac{dC_d}{dt} = P - R + D
Gross production is provided by aPAR, respiration is provided by R, and gas exchange is provided by the remainder. The likelihood of the parameters a, R, and b given the observed data are estimated from the JAGS model using prior distributions shown in the model file. At each time step, the change in oxygen concentration between time steps is calculated from the equation using model inputs and parameter guesses, and then a finite difference approximation is used to estimate modeled oxygen concentration. The first modeled value starts at the mean oxygen concentration for all measurements in the optimization period. The estimated concentration at each time step is also returned for comparison to observed as one measure of model performance.
The prior distributions for the a, R, and b parameters are defined in the model file included with the package as normal distributions with mean and standard deviations provided by the aprior
, rprior
, and bprior
arguments. The default values were chosen based on approximate values from national syntheses of metabolic estimates. An additional constraint is that all the prior distributions are truncated to be positive values as required by the core metabolism equation above. The upper limit for b is set as two times 0.251, as given in eqn. 4 in Wanninkhof 2014. Units for each parameter are (mmol m-2 d-1)/(W m-2) for a, mmol m-2 d-1 for R, and (cm hr-1)/(m2 s-2) for b.
The ndays
argument defines the model optimization period as the number of days that are used for optimizing the above mass balance equation. By default, this is done each day, i.e., ndays= 1
such that a loop is used that applies the model equation to observations within each day, evaluated iteratively from the first observation in a day to the last. Individual parameter estimates for a, R, and b are then returned for each day. However, more days can be used to estimate the unknown parameters, such that the loop can be evaluated for every ndays
specified by the argument. The ndays
argument will separate the input data into groups of consecutive days, where each group has a total number of days equal to ndays
. The final block may not include the complete number of days specified by ndays
if the number of unique dates in the input data includes a remainder when divided by ndays
, e.g., if seven days are in the input data and ndays = 5
, there will be two groups where the first has five days and the second has two days. The output data from ebase
includes a column that specifies the grouping that was used based on ndays
.
Missing values in the input data are also interpolated prior to estimating metabolism. It is the responsibility of the user to verify that these interpolated values are not wildly inaccurate. Missing values are linearly interpolated between non-missing values at the time step specified by the value in interval
. This works well for small gaps, but can easily create inaccurate values at gaps larger than a few hours. The interp_plot
function can be used to visually assess the interpolated values. Records at the start or end of the input time series that do not include a full day are also removed. A warning is returned to the console if gaps are found or dangling records are found.
The maxinterp
argument specifies a minimum number of observations that must not be interpolated within groups defined by ndays
that are assigned NA
in the output (except Date
and DateTimeStamp
). Groups with continuous rows of interpolated values with length longer than this argument are assigned NA
. The default value is half a day, i.e., 43200 seconds divided by the value in interval
.
The doave
argument can be used to define which dissolved oxygen value is used as the starting point in the Bayesian estimation for the optimization period. The default setting (doave = TRUE
) will use the average of all the dissolved oxygen values in the optimization period defined by ndays
. For example, the average of all dissolved oxygen values in each 24 hour period will be used if doave = TRUE
and ndays = 1
. The first dissolved oxygen observation of the time series in the optimization period will be used as the starting point if doave = F
. In this case, the simulated dissolved oxygen time series will always start at the first observed dissolved oxygen value for each optimization period.
Value
A data frame with metabolic estimates for areal gross production (P
, O2 mmol m-2 d-1), respiration (R
, O2 mmol m-2 d-1), and gas exchange (D
, O2 mmol m-2 d-1, positive values as ingassing, negative values as outgassing). Additional parameters estimated by the model that are returned include a
and b
. The a
parameter is a constant that represents the primary production per quantum of light with units of (mmol m-2 d-1)/(W m-2) and is used to estimate gross production (Grace et al., 2015). The b
parameter is a constant used to estimate gas exchange in (cm hr-1)/(m2 s-2) (provided as 0.251 in eqn. 4 in Wanninkhof 2014). Observed dissolved oxygen (DO_obs
, mmol m-3), modeled dissolved oxygen (DO_mod
, mmol m-3), and delta dissolved oxygen of the modeled results (dDO
, mmol m-3 d-1) are also returned. Note that delta dissolved oxygen is a daily rate.
95% credible intervals for a
, b
, and R
are also returned in the corresponding columns alo
, ahi
, blo
, bhi
, Rlo
, and Rhi
, for the 2.5th and 97.5th percentile estimates for each parameter, respectively. These values indicate the interval within which there is a 95% probability that the true parameter is in this range. Note that all values for these parameters are repeated across rows, although only one estimate for each is returned based on the number of days defined by ndays
.
Model fit can also be assessed using the converge
and rsq
columns. The values in these columns apply to each group in the grp
column as specified with the ndays
argument. The converge
column indicates "Check convergence"
or "Fine"
if the JAGS estimate converged at that iteration (repeated across rows for the group). The n.chains
argument can be increased if convergence is not achieved. Similarly, the rsq
column shows the r-squared values of the linear fit between the modeled and observed dissolved oxygen (repeated across rows for the group). These values can also be viewed with fit_plot
.
References
Grace, M.R., Giling, D.P., Hladyz, S., Caron, V., Thompson, R.M., Nally, R.M., 2015. Fast processing of diel oxygen curves: Estimating stream metabolism with BASE (BAyesian Single-station Estimation). Limnology and Oceanography: Methods 13, e10011. https://doi.org/10.1002/lom3.10011
Wanninkhof, R., 2014. Relationship between wind speed and gas exchange over the ocean revisited. Limnology and Oceanography: Methods 12, 351–362. https://doi.org/10.4319/lom.2014.12.351
Examples
# get one day of data
dat <- exdat[as.Date(exdat$DateTimeStamp, tz = 'America/Jamaica') == as.Date('2012-06-01'), ]
# run ebase, use more chains and iterations for a better fit, update.chains as T
ebase(dat, interval = 900, Z = 1.85, n.chains = 2, n.iter = 50,
update.chains = FALSE)