filter.data {bigleaf} | R Documentation |
Basic Eddy Covariance Data Filtering
Description
Filters time series of EC data for high-quality values and specified meteorological conditions.
Usage
filter.data(
data,
quality.control = TRUE,
filter.growseas = FALSE,
filter.precip = FALSE,
filter.vars = NULL,
filter.vals.min,
filter.vals.max,
NA.as.invalid = TRUE,
vars.qc = NULL,
quality.ext = "_qc",
good.quality = c(0, 1),
missing.qc.as.bad = TRUE,
GPP = "GPP",
doy = "doy",
year = "year",
tGPP = 0.5,
ws = 15,
min.int = 5,
precip = "precip",
tprecip = 0.01,
precip.hours = 24,
records.per.hour = 2,
filtered.data.to.NA = TRUE,
constants = bigleaf.constants()
)
Arguments
data |
Data.frame or matrix containing all required input variables in half-hourly or hourly resolution. Including year, month, day information |
quality.control |
Should quality control be applied? Defaults to |
filter.growseas |
Should data be filtered for growing season? Defaults to |
filter.precip |
Should precipitation filtering be applied? Defaults to |
filter.vars |
Additional variables to be filtered. Vector of type character. |
filter.vals.min |
Minimum values of the variables to be filtered. Numeric vector of
the same length than |
filter.vals.max |
Maximum values of the variables to be filtered. Numeric vector of
the same length than |
NA.as.invalid |
If |
vars.qc |
Character vector indicating the variables for which quality filter should
be applied. Ignored if |
quality.ext |
The extension to the variables' names that marks them as
quality control variables. Ignored if |
good.quality |
Which values indicate good quality (i.e. not to be filtered)
in the quality control (qc) variables? Ignored if |
missing.qc.as.bad |
If quality control variable is |
GPP |
Gross primary productivity (umol m-2 s-1); Ignored if |
doy |
Day of year; Ignored if |
year |
Year; Ignored if |
tGPP |
GPP threshold (fraction of 95th percentile of the GPP time series).
Must be between 0 and 1. Ignored if |
ws |
Window size used for GPP time series smoothing.
Ignored if |
min.int |
Minimum time interval in days for a given state of growing season.
Ignored if |
precip |
Precipitation (mm time-1) |
tprecip |
Precipitation threshold used to identify a precipitation event (mm).
Ignored if |
precip.hours |
Number of hours removed following a precipitation event (h).
Ignored if |
records.per.hour |
Number of observations per hour. I.e. 2 for half-hourly data. |
filtered.data.to.NA |
Logical. If |
constants |
frac2percent - conversion between fraction and percent |
Details
This routine consists of two parts:
1) Quality control: All variables included in vars.qc
are filtered for
good quality data. For these variables a corresponding quality variable with
the same name as the variable plus the extension as specified in quality.ext
must be provided. For time steps where the value of the quality indicator is not included
in the argument good.quality
, i.e. the quality is not considered as 'good',
its value is set to NA
.
2) Meteorological filtering. Under certain conditions (e.g. low ustar), the assumptions
of the EC method are not fulfilled. Further, some data analysis require certain meteorological
conditions, such as periods without rainfall, or active vegetation (growing season, daytime).
The filter applied in this second step serves to exclude time periods that do not fulfill the criteria
specified in the arguments. More specifically, time periods where one of the variables is higher
or lower than the specified thresholds (filter.vals.min
and filter.vals.max
)
are set to NA
for all variables. If a threshold is set to NA
, it will be ignored.
Value
If filtered.data.to.NA = TRUE
(default), the input data.frame/matrix with
observations which did not fulfill the filter criteria set to NA
.
If filtered.data.to.NA = FALSE
, the input data.frame/matrix with an additional
column "valid", which indicates whether all the data of a time step fulfill the
filtering criteria (1) or not (0).
Note
The thresholds set with filter.vals.min
and filter.vals.max
filter all data
that are smaller than ("<"), or greater than (">") the specified thresholds. That means
if a variable has exactly the same value as the threshold, it will not be filtered. Likewise,
tprecip
filters all data that are greater than tprecip
.
Variables considered of bad quality (as specified by the corresponding quality control variables)
will be set to NA
by this routine. Data that do not fulfill the filtering criteria are set to
NA
if filtered.data.to.NA = TRUE
. Note that with this option *all* variables of the same
time step are set to NA
. Alternatively, if filtered.data.to.NA = FALSE
data are not set to NA
,
and a new column "valid" is added to the data.frame/matrix, indicating if any value of a row
did (1) or did not fulfill the filter criteria (0).
Examples
# Example of data filtering; data are for a month within the growing season,
# hence growing season is not filtered.
# If filtered.data.to.NA=TRUE, all values of a row are set to NA if one filter
# variable is beyond its bounds.
DE_Tha_Jun_2014_2 <- filter.data(DE_Tha_Jun_2014,quality.control=FALSE,
vars.qc=c("Tair","precip","H","LE"),
filter.growseas=FALSE,filter.precip=TRUE,
filter.vars=c("Tair","PPFD","ustar"),
filter.vals.min=c(5,200,0.2),
filter.vals.max=c(NA,NA,NA),NA.as.invalid=TRUE,
quality.ext="_qc",good.quality=c(0,1),
missing.qc.as.bad=TRUE,GPP="GPP",doy="doy",
year="year",tGPP=0.5,ws=15,min.int=5,precip="precip",
tprecip=0.1,precip.hours=24,records.per.hour=2,
filtered.data.to.NA=TRUE)
## same, but with filtered.data.to.NA=FALSE
DE_Tha_Jun_2014_3 <- filter.data(DE_Tha_Jun_2014,quality.control=FALSE,
vars.qc=c("Tair","precip","H","LE"),
filter.growseas=FALSE,filter.precip=TRUE,
filter.vars=c("Tair","PPFD","ustar"),
filter.vals.min=c(5,200,0.2),
filter.vals.max=c(NA,NA,NA),NA.as.invalid=TRUE,
quality.ext="_qc",good.quality=c(0,1),
missing.qc.as.bad=TRUE,GPP="GPP",doy="doy",
year="year",tGPP=0.5,ws=15,min.int=5,precip="precip",
tprecip=0.1,precip.hours=24,records.per.hour=2,
filtered.data.to.NA=FALSE)
# note the additional column 'valid' in DE_Tha_Jun_2014_3.
# To remove time steps marked as filtered out (i.e. 0 values in column 'valid'):
DE_Tha_Jun_2014_3[DE_Tha_Jun_2014_3["valid"] == 0,] <- NA