R: Calculate the area under a line(curve).

auc {flux}

R Documentation

Calculate the area under a line(curve).

Description

Calculates the area under a curve (integral) following the trapezoid rule. With auc.mc several Monte Carlo methods can be applied to obtain error terms for estimating the interpolation error for the integration.

Usage

auc(x, y, thresh = NULL, dens = 100, sort.x = TRUE)

auc.mc(x, y, method = "leave out", lo = 2, it = 100, ...)

Arguments

`x`	Numerical vector giving the x cordinates of the points of the line (curve).
`y`	Numerical vector giving the y cordinates of the points of the line (curve). One can calculate the integral of a fitted line through giving a vector to `x` that spans `xlim` with small intervals and predicting the y coordinates with `predict` and that `x`-vector as `newdata`. See example.
`thresh`	Threshold below which area is not calculated. Can be used to deal with unrealistically low flux data. By default `thresh` is set to `NULL` and therefore the complete area below the zero line is subtracted from the area above the zero line to integrate the area under the curve. When data below a certain value make no sense for your question, you are able to set `thresh`. Then, all y-values below `thresh` are set to the value of `thresh` and the regarding areas below `thresh` are not subtracted from the total area.
`dens`	By default the data density is artificially increased by adding 100 data points between given adjacent data points. These additional data points are calculated by linear interpolation along x and y. When a threshold is set, this procedure increases the accuracy of the result. Setting `dens` has no effect on the result when `thresh` is set to `NULL`.
`sort.x`	By default the vectors in `x` and `y` are ordered along increasing `x` because integration makes no sense with unordered data. You can override this by setting `sort.x` = `FALSE`
`method`	Specify how interpolation error should be estimated. Available methods include `"leave out"`, `"bootstrap"`, `"sorted bootstrap"`, `"constrained bootstrap"`, `"jackknife"`, `"jack-validate"`. True bootstrap is only effective when `sort.x` = `FALSE`. See details.
`lo`	When estimating interpolation error with `"leave out"` or `"jack-validate"`, how many data points should be left out randomly? Defaults to 2. See `method` and details.
`it`	How many iterations should be run when using `auc.mc` to estimate the interpolation error. Defaults to 100.
`...`	Any arguments passed through to `auc`.

Details

During integration the underlying assumption is that values can be interpolated linearly between adjacent data points. In many cases this is questionable. For estimating the linear interpolation error from the data at hand one may use Monte Carlo resampling methods. In auc.mc the following approaches are available:

leave out: In each run lo data points are randomly omitted. This is quite straightforward, but the number of data points left out (lo) is arbitrary and thus the error terms estimated with this approach may be hardly defensible.
bootstrap: Data are bootstrapped (sampling with replacement). Thus, some data points may repeat whereas others may be omitted. Due to the random sampling the order of data points is changed which may be unwanted with times series and may produce largely exaggerated error terms. This is only effective if sort.x = FALSE.
sorted bootstrap: Same as before but ordering along x after bootstrapping may cure some problems of changed order. However, due to repeated data points time series spreading seasons but having data showing distinct seasonality may still be misrepresented.
constrained bootstrap: Same as before but after ordering repeated data points are omitted. Thus, this equals leaving some measurements out at each run with a random number of leave outs. Numbers of leave outs typically show normal distribution around 3/4n.
jackknife: auc is calculated for all possible combinations of length(x)-1 data points. Depending on length(x) the number of combinations can be quite low.
jack-validate: auc is calculated for all possible combinations of (length(x)-lo) : (length(x)-1) data points. Partly cures the "arbitrarity" problem of the leave out approach and produces stable summary statistics.

Value

auc returns a numeric value that expresses the area under the curve. The unit depends from the input.

auc.mc returns a numeric vector containing the auc values of the it permutations. Just calculate summary statistics from this as you like. Due to the sampling approaches means and medians are not stable for most of the methods. jackknife and jack-validate produce repeatable results, in the case of leave out it depends on n (length(x)) and it.

Author(s)

Gerald Jurasinski, gerald.jurasinski@uni-rostock.de

Examples

## Construct a data set (Imagine 2-hourly ghg emission data
## (methane) measured during a day).
## The emission vector (data in mg CH4 / m2*h) as a time series.
ghg <- ts(c(12.3, 14.7, 17.3, 13.2, 8.5, 7.7, 6.4, 3.2, 19.8, 
22.3, 24.7, 15.6, 17.4), start=0, end=24, frequency=0.5)
## Have a look at the emission development.
plot(ghg)
## Calculate what has been emitted that day
## Assuming that emissions develop linearly between
## measurements
auc(time(ghg), ghg)

## Test some of the auc.mc approaches
## "leave out" as default
auc.rep <- auc.mc(time(ghg), ghg)
## mean and median are well below the original value
summary(auc.rep)
## results for "bootstrap" are unstable (run several times)
auc.rep <- auc.mc(time(ghg), ghg, "boot")
summary(auc.rep)
## results for "jack-validate" are stable (run several times)
auc.rep <- auc.mc(time(ghg), ghg, "jack-val", lo=3)
summary(auc.rep)

## The effect of below.zero:
## Shift data, so that we have negative emissions (immissions)
ghg <- ghg-10
## See the difference
plot(ghg)
abline(h=0)
## With thresh = NULL the negative emissions are subtracted
## from the positive emissions
auc(time(ghg), ghg)
## With thresh = -0.5 the negative emissions are set to -0.5
## and only the emissions >= -0.5 count.
auc(time(ghg), ghg, thresh = -0.5)

[Package flux version 0.3-0.1 Index]