R: Subset and Predict Functions

Subset-Predict {gapfill}

R Documentation

Subset and Predict Functions

Description

The Subset and Predict function used in the default configuration of Gapfill. To predict a missing value, the two function are called sequentially as described the help page of Gapfill.

Usage

Subset(data, mp, i, initialSize = c(10L, 10L, 1L, 5L))

Predict(
  a,
  i,
  nTargetImage = 5,
  nImages = 4,
  nQuant = 2,
  predictionInterval = FALSE,
  qrErrorToNA = TRUE
)

Arguments

`data`	Numeric array with four dimensions. The input (satellite) data to be gap-filled. Missing values should be encoded as `NA`. The data should have the dimensions: x coordinate, y coordinate, seasonal index (e.g., day of the year), and year. See the `ndvi` dataset for an example.
`mp`	Integer vector of length 4 encoding the position of the missing value in `data` to predict.
`i`	Integer vector of length 1. The number of tried subsets that lead to a `NA` return value from `Predict`.
`initialSize`	Integer vector of length 4, that provides the size of the subset for `i = 0`.
`a`	Return value of `Subset()`.
`nTargetImage`	Integer vector of length 1. Minimum number of non-NA values in the image containing the missing value. If the criterion is not met, `NA` is returned.
`nImages`	Integer vector of length 1. Minimum number of non-empty images. If the criterion is not met, `NA` is returned.
`nQuant`	Integer vector of length 1. Parameter passed to `EstimateQuantile`.
`predictionInterval`	Logical vector of length 1. If `FALSE` (default), no prediction interval is returned. If `TRUE`, the predicted value together with the lower and upper bounds of an approximated 90% prediction interval are returned. In that case, the function returns 3 values, and hence, the argument `nPredict` of `gapfill` has to be set to 3 in order to store all returned values.
`qrErrorToNA`	Logical vector of length 1. If `TRUE` (default), an error in the quentile regression fitting leads to a `NA` return value. If `FALSE`, an error in the quentile regression fitting leads to an error and stops the prediction.

Details

The Subset function defines the search strategy to find a relevant subset by calling the function ArrayAround. The size of the initial subset is given by the argument initialSize. Its default values is c(5L, 5L, 1L, 5L), which corresponds to a spatial extend of 5 pixels in each direction from the missing value and includes time points having the previous, the same or the next seasonal index and are not further apart than 5 years. With an increase of the argument i, the spatial extent of the subset increases.

The Predict function decides whether the subset a is suitable and calculates the prediction (fill value) when a suitable subset is provided. To formulate the conditions that are used to decide if a subset is suitable, consider the subset a as a collection of images. More precisely, if dim(a) = c(d1, d2, d3, d4), it can be seen as a collection of d3*d4 images with an extent of d1 by d2 pixels. Using this terminology, we require the following conditions to be fulfilled in order to predict the missing value:

a contains at least nTargetImage non-NA values in the image containing the missing value,
a contains at least nImages non-empty images.

The prediction itself is based on sorting procedures (see Score and EstimateQuantile) and the quantile regression function rq.

If the argument predictionInterval is TRUE the Predict functions returns the predicted value together with the lower and upper bounds of an approximated 90% prediction interval. The interval combines the uncertainties introduced by Score and EstimateQuantile.

Value

Subset returns an array with 4 dimensions containing the missing value at the position indicated by the attribute mp.

Predict returns a numeric vector containing the predicted value (and if predictionInterval is TRUE, the lower and upper bounds of the prediction interval), or NA, if no prediction was feasible.

Note

The current implementation of Subset does not take into account that locations at the boundary of data can be neighboring to each other. For example, if global data (entire sphere) are considered, the location data[1,1,,] is a neighbor of data[dim(data)[1], dim(data)[2],,]. Similar considerations apply when data are available for an entire year. To take this into account, the Subset function can be redefined accordingly or the data can be augmented.

Author(s)

Florian Gerber, flora.fauna.gerber@gmail.com.

References

F. Gerber, R. de Jong, M. E. Schaepman, G. Schaepman-Strub, and R. Furrer (2018) in IEEE Transactions on Geoscience and Remote Sensing, pp. 1-13, doi: 10.1109/TGRS.2017.2785240.

Examples

## Assume we choose c(5, 5, 1, 5) as initalSize of the subset
iS <- c(5, 5, 1, 5)
## case 1: initial subset leads to prediction -------
i <- 0
a <- Subset(data = ndvi, mp = c(1, 3, 1, 2), i = i, initialSize = iS)
p <- Predict(a = a, i = i)
p
stopifnot(identical(a, ArrayAround(data = ndvi, mp = c(1, 3, 1, 2),
                                   size = c(5 + i, 5 + i, 1, 5))))
stopifnot(identical(p, Gapfill(data = ndvi, subset = 1807,
                               initialSize = iS, verbose = FALSE)$fill[1807]))

## case 2: two tries are necessary ------------------
i <- 0
a <- Subset(data = ndvi, mp = c(20, 1, 1, 2), i = i, initialSize = iS)
p <- Predict(a = a, i = i)
p

## Increase i and try again.
i <- i + 1
a <- Subset(data = ndvi, mp = c(20, 1, 1, 2), i = i, initialSize = iS)
p <- Predict(a = a, i = i)
p
stopifnot(identical(a, ArrayAround(data = ndvi, mp = c(20, 1, 1, 2),
                                   size = c(5 + i, 5 + i, 1, 6))))
stopifnot(identical(p, Gapfill(data = ndvi, subset = 1784,
                               initialSize = iS, verbose = FALSE)$fill[1784]))

[Package gapfill version 0.9.6-1 Index]