auc {flux} | R Documentation |
Calculate the area under a line(curve).
Description
Calculates the area under a curve (integral) following the trapezoid rule. With auc.mc
several Monte Carlo methods can be applied to obtain error terms for estimating the interpolation error for the integration.
Usage
auc(x, y, thresh = NULL, dens = 100, sort.x = TRUE)
auc.mc(x, y, method = "leave out", lo = 2, it = 100, ...)
Arguments
x |
Numerical vector giving the x cordinates of the points of the line (curve). |
y |
Numerical vector giving the y cordinates of the points of the line (curve). One can calculate the integral of a fitted line through giving a vector to |
thresh |
Threshold below which area is not calculated. Can be used to deal with unrealistically low flux data. By default |
dens |
By default the data density is artificially increased by adding 100 data points between given adjacent data points. These additional data points are calculated by linear interpolation along x and y. When a threshold is set, this procedure increases the accuracy of the result. Setting |
sort.x |
By default the vectors in |
method |
Specify how interpolation error should be estimated. Available methods include |
lo |
When estimating interpolation error with |
it |
How many iterations should be run when using |
... |
Any arguments passed through to |
Details
During integration the underlying assumption is that values can be interpolated linearly between adjacent data points. In many cases this is questionable. For estimating the linear interpolation error from the data at hand one may use Monte Carlo resampling methods. In auc.mc
the following approaches are available:
-
leave out
: In each runlo
data points are randomly omitted. This is quite straightforward, but the number of data points left out (lo
) is arbitrary and thus the error terms estimated with this approach may be hardly defensible. -
bootstrap
: Data are bootstrapped (sampling with replacement). Thus, some data points may repeat whereas others may be omitted. Due to the random sampling the order of data points is changed which may be unwanted with times series and may produce largely exaggerated error terms. This is only effective ifsort.x = FALSE
. -
sorted bootstrap
: Same as before but ordering alongx
after bootstrapping may cure some problems of changed order. However, due to repeated data points time series spreading seasons but having data showing distinct seasonality may still be misrepresented. -
constrained bootstrap
: Same as before but after ordering repeated data points are omitted. Thus, this equals leaving some measurements out at each run with a random number of leave outs. Numbers of leave outs typically show normal distribution around 3/4n. -
jackknife
:auc
is calculated for all possible combinations oflength(x)-1
data points. Depending onlength(x)
the number of combinations can be quite low. -
jack-validate
:auc
is calculated for all possible combinations of(length(x)-lo)
:(length(x)-1)
data points. Partly cures the "arbitrarity" problem of theleave out
approach and produces stable summary statistics.
Value
auc
returns a numeric value that expresses the area under the curve. The unit depends from the input.
auc.mc
returns a numeric vector containing the auc
values of the it
permutations. Just calculate summary statistics from this as you like. Due to the sampling approaches means and medians are not stable for most of the methods. jackknife
and jack-validate
produce repeatable results, in the case of leave out
it depends on n (length(x)
) and it
.
Author(s)
Gerald Jurasinski, gerald.jurasinski@uni-rostock.de
See Also
Examples
## Construct a data set (Imagine 2-hourly ghg emission data
## (methane) measured during a day).
## The emission vector (data in mg CH4 / m2*h) as a time series.
ghg <- ts(c(12.3, 14.7, 17.3, 13.2, 8.5, 7.7, 6.4, 3.2, 19.8,
22.3, 24.7, 15.6, 17.4), start=0, end=24, frequency=0.5)
## Have a look at the emission development.
plot(ghg)
## Calculate what has been emitted that day
## Assuming that emissions develop linearly between
## measurements
auc(time(ghg), ghg)
## Test some of the auc.mc approaches
## "leave out" as default
auc.rep <- auc.mc(time(ghg), ghg)
## mean and median are well below the original value
summary(auc.rep)
## results for "bootstrap" are unstable (run several times)
auc.rep <- auc.mc(time(ghg), ghg, "boot")
summary(auc.rep)
## results for "jack-validate" are stable (run several times)
auc.rep <- auc.mc(time(ghg), ghg, "jack-val", lo=3)
summary(auc.rep)
## The effect of below.zero:
## Shift data, so that we have negative emissions (immissions)
ghg <- ghg-10
## See the difference
plot(ghg)
abline(h=0)
## With thresh = NULL the negative emissions are subtracted
## from the positive emissions
auc(time(ghg), ghg)
## With thresh = -0.5 the negative emissions are set to -0.5
## and only the emissions >= -0.5 count.
auc(time(ghg), ghg, thresh = -0.5)