impute_dataset {convergEU} | R Documentation |
Imputation to make a dataset complete
Description
For initial and final missing values there are two options: they could be completely cancelled or, otherwise propagated. For all other missing values within the dataset, deterministic linear imputation is applied in order to obtain complete data.
Usage
impute_dataset(
myTB,
countries,
timeName = "time",
tailMiss = c("cut", "constant")[2],
headMiss = c("cut", "constant")[1]
)
Arguments
myTB |
a dataset (tibble) time by countries for a given indicator, sorted by time. Note that times corresponding to missing data must be contained in the dataset. |
countries |
the collection of labels representing countries to process. |
timeName |
the string that represent the name of the time variable. |
tailMiss |
what should be done with subsequent missing values starting at the oldest year: cut those years, or input constant values equal to the first observed year. |
headMiss |
what should be done with subsequent missing values ending at the last year: cut those years, or input constant values equal to the first observed year. |
Value
a list with three components: "res": the dataset (tibble) without missing values; "msg" and "err"
References
Examples
# Example 1
# Dataset in the format time by countries with missing values:
myTB2 <- tibble::tribble(
~time, ~UK, ~DE, ~IT,
1988, 998, 1250, 332,
1989, NA, 868, NA,
1990, 1150, 978, NA,
1991, 1600, NA, 802
)
toBeProcessed <- c( "UK","DE","IT")
# Simplest Imputation using option "cut":
resImpu <- impute_dataset(myTB2, countries=toBeProcessed,
timeName = "time",
tailMiss = c("cut", "constant")[1],
headMiss = c("cut", "constant")[1])
# Imputation using option "constant":
resImpu1 <- impute_dataset(myTB2, countries=toBeProcessed,
timeName = "time",
tailMiss = c("cut", "constant")[2],
headMiss = c("cut", "constant")[2])
# Imputation using both options "cut" and "constant":
resImput <- impute_dataset(myTB2, countries=toBeProcessed,
timeName = "time",
tailMiss = c("cut", "constant")[2],
headMiss = c("cut", "constant")[1])
# Example 2
# dataset time by countries for the indicator "JQIintensity_i":
myTB <- extract_indicator_EUF(
indicator_code = "JQIintensity_i", #Code_in_database
fromTime= 1965,
toTime=2016,
gender= c("Total","Females","Males")[1],
countries= convergEU_glb()$EU27$memberStates$codeMS)
# Imputation of missing values, option "cut":
myTBinp <- impute_dataset(myTB$res, timeName = "time",
countries=convergEU_glb()$EU27$memberStates$codeMS,
tailMiss = c("cut", "constant")[1],
headMiss = c("cut", "constant")[1])
# Imputation of missing values, option "constant":
myTBinp1 <- impute_dataset(myTB$res, timeName = "time",
countries=convergEU_glb()$EU27$memberStates$codeMS,
tailMiss = c("cut", "constant")[2],
headMiss = c("cut", "constant")[2])