na_locf {imputeTS} | R Documentation |
Missing Value Imputation by Last Observation Carried Forward
Description
Replaces each missing value with the most recent present value prior to it (Last Observation Carried Forward- LOCF). Optionally this can also be done starting from the back of the series (Next Observation Carried Backward - NOCB).
Usage
na_locf(x, option = "locf", na_remaining = "rev", maxgap = Inf)
Arguments
x |
Numeric Vector ( |
option |
Algorithm to be used. Accepts the following input:
|
na_remaining |
Method to be used for remaining NAs.
|
maxgap |
Maximum number of successive NAs to still perform imputation on. Default setting is to replace all NAs without restrictions. With this option set, consecutive NAs runs, that are longer than 'maxgap' will be left NA. This option mostly makes sense if you want to treat long runs of NA afterwards separately. |
Details
General Functionality
Replaces each missing value with the most recent present value prior to it (Last Observation Carried Forward - LOCF). This can also be done in reverse direction, starting from the end of the series (then called Next Observation Carried Backward - NOCB).
Handling for NAs at the beginning of the series
In case one or more successive observations directly at the start of the time series are NA, there exists no 'last value' yet, that can be carried forward. Thus, no LOCF imputation can be performed for these NAs. As soon as the first non-NA value appears, LOCF can be performed as expected. The same applies to NOCB, but from the opposite direction.
While this problem might appear seldom and will only affect a very small
amount of values at the beginning, it is something to consider.
The na_remaining
parameter helps to define, what should happen
with these values at the start, that would remain NA after pure LOCF.
Default setting is na_remaining = "rev"
, which performs
nocb / locf from the other direction to fill these NAs. So a NA
at the beginning will be filled with the next non-NA value appearing
in the series.
With na_remaining = "keep"
NAs at the beginning (that can not
be imputed with pure LOCF) are just left as remaining NAs.
With na_remaining = "rm"
NAs at the beginning of the series are
completely removed. Thus, the time series is basically shortened.
Also available is na_remaining = "mean"
, which uses the overall
mean of the time series to replace these remaining NAs. (but beware,
mean is usually not a good imputation choice - even if it only affects
the values at the beginning)
Value
Vector (vector
) or Time Series (ts
)
object (dependent on given input at parameter x)
Author(s)
Steffen Moritz
See Also
na_interpolation
,
na_kalman
,
na_ma
, na_mean
,
na_random
, na_replace
,
na_seadec
, na_seasplit
Examples
# Prerequisite: Create Time series with missing values
x <- ts(c(NA, 3, 4, 5, 6, NA, 7, 8))
# Example 1: Perform LOCF
na_locf(x)
# Example 2: Perform NOCF
na_locf(x, option = "nocb")
# Example 3: Perform LOCF and remove remaining NAs
na_locf(x, na_remaining = "rm")
# Example 4: Same as example 1, just written with pipe operator
x %>% na_locf()