R: Interpolation of Missing Data in a Pollen Database by...

interpollen {AeRobiology}

R Documentation

Interpolation of Missing Data in a Pollen Database by Different Methods

Description

Function to simultaneously replace all missing data of an historical database of several pollen types by using different methods of interpolation.

Usage

interpollen(data, method = "lineal", maxdays = 30, plot = TRUE,
  factor = 2, ndays = 3, spar = 0.5, data2 = NULL, data3 = NULL,
  data4 = NULL, data5 = NULL, mincorr = 0.6, result = "wide")

Arguments

`data`	A `data.frame` object including the general database where interpollation must be performed. This `data.frame` must include a first column in `Date` format and the rest of columns in `numeric` format. Each column must contain information of one pollen type. It is not necessary to insert missing gaps; the function will automatically detect them.
`method`	A `character` string specifying the method applied to calculate and generate the pollen missing data. The implemented methods that can be used are: `"lineal"`, `"movingmean"`, `"spline"`, `"tseries"` or `"neighbour"`. A more detailed information about the different methods may be consulted in Details. The `method` argument will be `"lineal"` by default.
`maxdays`	A `numeric (interger)` value specifying the maximum number of consecutive days with missing data that the algorithm is going to interpolate. If the gap is bigger than the argument value, the gap will not be interpolated. Not valid with `"tseries"` method. The `maxdays` argument will be `30` by default.
`plot`	A `logical` argument. If `TRUE`, graphical previews of the input database will be plot at the end of the interpolation process. All the interpolated gaps will be marked in red. The `plot` argument will be `TRUE` by default.
`factor`	A `numeric (interger)` value bigger than `1`. Only valid if the `"movingmean"` method is chosen. The argument specifies the factor which will multiply the gap size to stablish the range of the moving mean that will fulfill the gap. A more detailed information about the selection of the factor may be consulted in Details. The argument `factor` will be `1` by default.
`ndays`	A `numeric (interger)` value bigger than `1`. Only valid if the `"spline"` method is chosen. Specifies the number of days beyond each side of the gap which are used to perform the spline regression. The argument `ndays` will be `3` by default.
`spar`	A `numeric (double)` value ranging `0_1` specifying the degree of smoothness of the spline regression adjustment. As smooth as the adjustment is, more data are considered as outliers for the spline regression. Only valid if the `"spline"` method is chosen. The argument `"spar"` will be `0.5` by default.
`data2`, `data3`, `data4`, `data5`	A `data.frame` object (each one) including database of a neighbour pollen station which will be used to interpolate missing data in the target station. Only valid if the "neighbour" method is chosen. This `data.frame` must include a first column in `Date` format and the rest of columns in `numeric` format belonging to each pollen type by column. It is not necessary to insert the missing gaps; the function will automatically detect them. The arguments will be `NULL` by default.
`mincorr`	A `numeric (double)` value ranging `0_1`. It specifies the minimal correlation coefficient (Spearman correlations) that neighbour stations must have with the target station to be taken into account for the interpolation. Only valid if the `"neighbour"` method is chosen. The argument `"mincorr"` will be `0.6` by default.
`result`	A `character` string specifying the format of the resulting `data.frame`. Only `"wide"` or `"long"`. The `result` argument will be `"wide"` by default.

Details

This function allows to interpolate missing data in a pollen database using 4 different methods which are described below. Interpolation for each pollen type will be automatically done for gaps smaller than the "maxdays" argument.

"lineal" method. The interpolation will be carried out by tracing a straight line between the gap extremes.
"movingmean" method. It calculates the moving mean of the pollen daily concentrations with a window size of the gap size multiplicated by the factor argument and replace the missing data with the moving mean for these days. It is a dynamic function and for each gap of the database, the window size of the moving mean changes depending of each gap size.
"spline" method. The interpolation will be carried out by performing a spline regression with the previous and following days to the gap. The number of days of each side of the gap that will be taken into account for calculating the spline regression are specified by ndays argument. The smoothness of the adjustment of the spline regression can be specified by the spar argument.
"tseries" method. The interpolation will be carried out by analysing the time series of pollen database. It performs a seasonal_trend decomposition based on LOESS (Cleveland et al., 1990). The seasonality of the historical database is extracted and used to predict the missing data by performing a linear regression with the target year.
"neighbour" method. Other near stations provided by the user are used to interpolate the missing data of the target station. First of all, a Spearman correlation is performed between the target station and the neighbour stations to discard the neighbour stations with a correlation coefficient smaller than mincorr value. For each gap, a linear regression is performed between the neighbour stations and the target stations to determine the equation which converts the pollen concentrations of the neighbour stations into the pollen concentration of the target station. Only neighbour stations without any missing data during the gap period are taken into account for each gap.

Value

This function returns different results:

If result = "wide", returns a data.frame including the original data and completed with the interpolated data.
If result = "long", returns a data.frame containing your data in long format (the first column for date, the second for pollen type, the third for concentration and an additional fourth column with 1 if this data has been interpolated or 0 if not).
If plot = TRUE, plots for each year and pollen type with daily values are represented in the active graphic window. Interpolated values are marked in red. If method argument is "tseries", the seasonality is also represented in grey.

References

Cleveland RB, Cleveland WS, McRae JE, Terpenning I (1990) STL: a seasonal_trend decomposition procedure based on loess. J Off Stat 6(1):3_33.

Examples

data("munich_pollen")
interpollen(munich_pollen, method = "lineal", plot = FALSE)

[Package AeRobiology version 2.0.1 Index]