R: Bulk fitting of Reco and GPP models

reco.bulk {flux}

R Documentation

Bulk fitting of Reco and GPP models

Description

The function allows for bulk fitting of R_{eco} and GPP models with the respective functions reco and gpp. This is often appropriate because data are gathered over a season, a year or longer...

Usage

reco.bulk(formula, data, INDEX, window = 1, hook = "mean", remove.outliers = FALSE, 
fall.back = TRUE, ...)

gpp.bulk(formula, data, INDEX, window = 1, hook = "mean", oot.id = c("D", "T"), 
min.dp = 5, Reco.m = NULL, ts.Reco = NULL, fall.back = TRUE, ...)

Arguments

`formula`	An object of class "`formula`" (or one that can be coerced to that class): a symbolic description of the terms that are used in bulk `R_{eco}` and GPP model fitting. Choices of terms are more restricted than typically (see details). For instance, a timestamp always has to be provided. Also, temperature variables are required for `gpp.bulk` if `R_{eco}` values are predicted from models.
`data`	A data frame (or an object that can be coerced to that class by `as.data.frame`) containing at least all the 'model' terms specified in `formula`.
`INDEX`	A vector of length `nrow(data)` that is used to extract and compile data, for instance according to measurement campaign in the field. Internally `split` is used with `f = INDEX` to create a list of `data.frame`s of which each contains all flux measurements for one model.
`window`	Both functions can fit the respective models across a moving window of adjacent `INDEX` values. Not advisable for GPP while `R_{eco}` modelling can really profit because more data points often lead to better models.
`hook`	Character string specifiying the kind of summary statistics used to fix a date and time to which the fitted model shall refer. Up to now this is simply achieved by doing one of these summary statistics on the timestamp: `mean`, `min`, `max` or `median`.
`remove.outliers`	Logical. If `TRUE` the function searches for outliers in the data points of the `R_{eco}` models and eliminates them. Per model the `boxplot.stats` of the residuals are obtained and if outliers are present they are eliminated and the model is fitted again. This is done twice. If the function fails in fitting the model to the new data set it falls back to the original data.
`oot.id`	Vector of length 2 that specifies which of the flux values derive from opaque (first value, i.e. `R_{eco}` measurements) and which derive from transparent (second value, i.e. NEE measurements) chamber measurements when `data` contains both. This is one of several approaches to GPP modeling here. See details.
`min.dp`	Numeric. Specifies the minimum number of data points that are accepted per model. Defaults to 5 which is already quite a small number.
`Reco.m`	Either an object of class "`reco`" resulting from `reco` with one `R_{eco}` model or an object of class "`breco`" resulting from `reco.bulk` with several `R_{eco}` models or a vector with estimated half hourly or hourly (or whatever interval you have) `R_{eco}` values. In case of the latter `ts.Reco` has to be specified as well because it is also used as a switch between internal `R_{eco}` modeling and assigning existing `R_{eco}` values. See details.
`ts.Reco`	POSIXlt or POSIXct vector with timestamps of the fluxes in `Reco.m`. Further, the default (`ts.Reco` = NULL) lets the function expect model object(s) in `Reco.m`.
`fall.back`	Logical. When TRUE the function falls back to linear mean models when the non linear approach did not work out (for `reco.bulk`: the slope of the linear relationship between Reco and temperature is < 0; for `gpp.bulk`: either no model could be fit or the starting slope parameter `alpha` is > 0). To do so a virtual data set is created with 50 random `GPP` values that have the same mean and sd as the original data and with a sequence of 50 `PAR` values spanning from 0 to 2000. A linear model is fit to these data with `lm(GPP ~ PAR)`.
`...`	Further arguments passed to `reco` or `gpp` e.g., the method for fitting the model when not using the respective defaults.

Details

Models are - comparable to regression models - specified symbolically. Accordingly, the basic form is response ~ terms with response always referring to CO_2 exchange rates. For terms requirements differ between the two methods. In contrast to other formulae the response and all terms have to be in data.

reco.bulk expects a formula of the form Reco ~ T1 + ... + timestamp with Reco referring to CO_2 fluxes estimated based on opaque chamber measurements (for instance with flux), T1 referring to temperature readings relevant for Reco (e.g. air temperature) and taken during the corresponding chamber measurements. The ... symbolizes that several more temperature readings can be specified if available (e.g. temperature in soil at 2cm), as many as you want. When more than one temperature is specified models are fit for each temperature and the best one is determined via AIC and reported together with the name of the corresponding temperature variable. Finally, timestamp is referring to the POSIXt timestamps that represent the dates and times of the corresponding measurements. timestamp always has to be specified as the last term of the formula. Models are fit using reco.

gpp.bulk expects a formula of the form NEE ~ PAR + timestamp + ... with NEE referring to CO_2 fluxes estimated based on transparent chamber measurements (for instance with flux), PAR referring to readings of the photosynthetically active radiation relevant for NEE and taken during the corresponding chamber measurements. The ... symbolizes that several more terms can or have to be specified. This depends on the approach to the R_{eco} part of the GPP modeling (see gpp).

Approaches to estimate GPP values from measured NEE data using corresponding R_{eco} values:

Approach 1: Extract corresponding R_{eco} fluxes from the provided data that are assigned to corresponding NEE values via their timestamp: For this approach data has to contain both NEE and R_{eco} fluxes and the model formula is specified as NEE ~ PAR + timestamp + oot with the latter referring to a variable that indicates whether the respective fluxes were measured as NEE (transparent chamber) or Reco (opaque chamber or low PAR). In addition oot.id may have to be changed accordingly. gpp2 is used for fitting the models.

Approach 2: Provide measured R_{eco} fluxes that are assigned to corresponding NEE values via their timestamp: To do this set ts.Reco != NULL and Reco.m a vector of R_{eco} fluxes and specifiy model with: NEE ~ PAR + timestamp. gpp is used for fitting the models.

Approach 3: Provide one R_{eco} model to predict R_{eco} fluxes at the time of the NEE measurements using the same temperature variable that was used to construct the model (with reco). Specify model with: NEE ~ PAR + timestamp + temperature. gpp is used for fitting the models.

Approach 4: Provide several R_{eco} models to predict R_{eco} fluxes at the time of the NEE measurements using the same temperature variables that were used to construct the models (with reco.bulk). The corresponding models are assigned to the NEE data via the timestamps that they carry. Specify model with: NEE ~ PAR + timestamp + temperature1 + temperature2 + temperature3 + .... All temperatures that may have been used for fitting the R_{eco} models (see above) should be given. gpp is used for fitting the models.

remove.outliers may result in better R_{eco} models. One should be careful with this and watch out for cases in which too many data points are eliminated. The function returns the number of skipped outliers per model to do just that.

If fall.back = TRUE no failed model fits are reported. That's quite useful when further bulk methods like budget.reco or budget.gpp are used to get annual or seasonal budgets.

Value

Both functions return complex list structures with models.

Output of reco.bulk: Object of class "breco", a list with length(unique(INDEX)) elements, each containing 3 elements:

`ts`	Timestamp of the model.
`mod`	Has itself two elements. The first contains the model object as returned by `reco` and is named according to the method used. The second, `n.out`, is optional (only reported when `remove.outliers = TRUE` and there were indeed outliers identified and skipped) and gives the number of omitted data points.
`which.Temp`	Character string that identifies the temperature variable that was finally used for constructing the best model

Output of gpp.bulk: Object of class "bgpp", a list with length(unique(INDEX)) elements each containing itself 2 entries:

`ts`	Timestamp of the model
`mod`	Either an object of class "`gpp`" or of class "`gpp2`" depending on the approach used. Approaches 1 and 2 return "`gpp2`" objects, Approaches 3 and 4 return "`gpp`" objects. See `gpp` and `gpp2` for details.

Author(s)

Gerald Jurasinski, gerald.jurasinski@uni-rostock.de,

with suggestions by Sascha Beetz, sascha.beetz@uni-rostock.de

References

Beetz S, Liebersbach H, Glatzel S, Jurasinski G, Buczko U, Hoper H (2013) Effects of land use intensity on the full greenhouse gas balance in an Atlantic peat bog. Biogeosciences 10:1067-1082

Examples

## Whole example is consecutive and largely marked as
## not run because parts take longer than
## accepted by CRAN incoming checks.
## Remove first hash in each line to run them.
data(amd)
data(amc)

### Reco ###
## do reco models with 3 campaign wide window and 
## outlier removal (outliers according to models)
# first extract opaque (dark) chamber measurements 
amr <- amd[amd$kind=="D",]

## Nor run ##
## do bulk fitting of reco models (all specified temperatures 
## are tested and the best model (per campaign) is finally stored)
#r.models <- reco.bulk(flux ~ t.air + t.soil2 + t.soil5 + 
#t.soil10 + timestamp, amr, amr$campaign, window=3, 
#remove.outliers=TRUE, method="arr", min.dp=2)
#
## adjust models (BEWARE: stupid models with t1 >= 20 are skipped 
## within the function, this can be changed)
#r.models <- modjust(r.models, alpha=0.1, min.dp=3)
#
## make data.frame (table) for overview of model parameters
## the temperature with which the best model could be fit is reported
## this information also resides in the model objects in r.models
#tbl8(r.models)
#
#### GPP ###
### fit GPP models using method = Falge and min.dp = 5
### and take opaque (dark, i.e. reco) measurements from data
## the function issues a warning because some campaigns have
## not enough data points
#g.models <- gpp.bulk(flux ~ PAR + timestamp + kind, amd, amd$campaign, 
#method="Falge", min.dp=5)
#tbl8(g.models)
#
### alternative approaches to acknowledge reco when fitting GPP models
## we need only fluxes based on transparent chamber measurements (NEE)
#amg <- amd[amd$kind=="T",]
## fit gpp models and predict reco from models
#g.models.a1 <- gpp.bulk(flux ~ PAR + timestamp + t.air + t.soil2 + 
#t.soil5 + t.soil10, amg, amg$campaign, method="Falge", min.dp=5, 
#Reco.m=r.models)
#tbl8(g.models.a1)
## have a look the model fits (first 10)
#par(mfrow=c(5,6))
## select only non linear fits
#sel <- sapply(g.models.a1, function(x) class(x$mod$mg)=="nls")
#lapply(g.models.a1[sel][1:10], function(x) plot(x$mod, single.pane=FALSE))
#
## fit gpp models with providing reco data
## to do so, rerun budget.reco with other start and end points
#set.back <- data.frame(timestamp = c("2009-09-01 00:30", "2011-12-31 23:30"), 
#value = c(-999, -9999))
#set.back$timestamp <- strptime(set.back$timestamp, format="%Y-%m-%d %H:%M")
#r.bdgt.a2 <- budget.reco(r.models, amc, set.back)
## now fit the models
#g.models.a2 <- gpp.bulk(flux ~ PAR + timestamp, amg, amg$campaign, 
#method="Falge", units = "30mins", min.dp=5, Reco.m=r.bdgt.a2$reco.flux, 
#ts.Reco = r.bdgt.a2$timestamp)
#tbl8(g.models.a2)
#
## End not run ##

[Package flux version 0.3-0.1 Index]