multipopulation_loocv {CvmortalityMult}R Documentation

Function to apply Leave-One-Out Cross-Validation (LOOCV) technique for testing the forecasting accuracy of multi-population mortality models

Description

R function for testing the accuracy out-of-sample of different multi-population mortality models, Additive (Debon et al., 2011) and Multiplicative (Russolillo et al., 2011). We provide a R function that employ the leave-one-out cross-validation technique for data panel-time series (Atance et al. 2020) to test the forecasting accuracy of one-multipopulation mortality model. This technique consists on split the database in two parts: training set (to run the model) and test set (to check the forecasting accuracy of the model) with only one data (one period in this case). This procedure is repeated several times trying to check the forecasting accuracy in different ways enlarging the training set one period-ahead. With this function, the user can provide its own mortality rates for different populations. The function will split the database chronologically (Bergmeir and Benitez, 2012) based on the trainset1 which consist on the length of the first training set. We have include the following Figure 2 to understand how the R function works. Figure: mai.png It should be mentioned that this function is developed for testing the the forecasting accuracy of several populations using leave-one-out cross-validation . However, in case you only consider one population, the function will forecast the Lee-Carter model for one population. To test the forecasting accuracy of the selected model, the function provides five different measures: SSE, MSE, MAE, MAPE or All. Depending on how you want to check the forecasting accuracy of the model you could select one or other. In this case, the measures will be obtained using the mortality rates in the normal scale as recommended by Santolino (2023) against the log scale.

Usage

multipopulation_loocv(
  qxt,
  model = c("additive", "multiplicative"),
  periods,
  ages,
  nPop,
  lxt = NULL,
  ktmethod = c("Arimapdq", "arima010"),
  kt_include.cte = TRUE,
  measures = c("SSE", "MSE", "MAE", "MAPE", "All"),
  trainset1
)

Arguments

qxt

mortality rates used to fit the multi-population mortality models. This rates can be provided in matrix or in data.frame.

model

multi-population mortality model chosen to fit the mortality rates c("additive", "multiplicative"). In case you do not provide any value, the function will apply the "additive" option.

periods

periods considered in the fitting in a vector way c(minyear:maxyear).

ages

vector with the ages considered in the fitting. If the mortality rates provide from an abridged life tables, it is necessary to provide a vector with the ages, see the example.

nPop

number of population considered for fitting.

lxt

survivor function considered for every population, not necessary to provide.

ktmethod

method used to forecast the value of kt Arima(p,d,q) or ARIMA(0,1,0); c("Arimapdq", "arima010").

kt_include.cte

if you want that kt include constant in the arima process.

measures

choose the non-penalized measure of forecasting accuracy that you want to use; c("SSE", "MSE", "MAE", "MAPE", "All"). Check the function. In case you do not provide any value, the function will apply the "SSE" as measure of forecasting accuracy.

trainset1

vector with the periods for the first training set. This value must be greater than 2 to meet the minimum time series size (Hyndman and Khandakar, 2008).

Value

A list with class "MultiCv" including different components of the cross-validation process:

References

Atance, D., Debon, A., and Navarro, E. (2020). A comparison of forecasting mortality models using resampling methods. Mathematics 8(9): 1550.

Bergmeir, C. & Benitez, J.M. (2012) On the use of cross-validation for time series predictor evaluation. Information Sciences, 191, 192–213.

Debon, A., & Atance, D. (2022). Two multi-population mortality models: A comparison of the forecasting accuracy with resampling methods. in Contributions to Risk Analysis: Risk 2022. Fundacion Mapfre

Debon, A., Montes, F., & Martinez-Ruiz, F. (2011). Statistical methods to compare mortality for a group with non-divergent populations: an application to Spanish regions. European Actuarial Journal, 1, 291-308.

Hyndman, R.J. & Khandakar, Y. (2008). Automatic time series forecasting: The forecast package for R. Journal of Statistical. Software, 26, 1–22.

Lee, R.D. & Carter, L.R. (1992). Modeling and forecasting US mortality. Journal of the American Statistical Association, 87(419), 659–671.

Russolillo, M., Giordano, G., & Haberman, S. (2011). Extending the Lee–Carter model: a three-way decomposition. Scandinavian Actuarial Journal, 96-117.

Santolino, M. (2023). Should Selection of the Optimum Stochastic Mortality Model Be Based on the Original or the Logarithmic Scale of the Mortality Rate?. Risks, 11(10), 170.

See Also

multipopulation_cv, fitLCmulti, forecast.fitLCmulti, plot.fitLCmulti, plot.forLCmulti, MeasureAccuracy.

Examples

#The example takes more than 5 seconds because it includes
#several fitting and forecasting process and hence all
#the process is included in donttest

#We present the leave-one-out cross-validation (LOOCV) method for spanish male regions
#The idea is to get the same results as in the short paper published in Risk Congress 2023
SpainRegions
ages <- c(0, 1, 5, 10, 15, 20, 25, 30, 35, 40,
         45, 50, 55, 60, 65, 70, 75, 80, 85, 90)
library(gnm)
library(forecast)
#Let start with a simple trainset1 = 10 CV method obtaining the SSE forecasting measure of accuracy
loocv_Spainmales_addit <- multipopulation_loocv(qxt = SpainRegions$qx_male,
                                         model = c("additive"),
                                         periods =  c(1991:2020), ages = c(ages),
                                         nPop = 18, lxt = SpainRegions$lx_male,
                                         ktmethod = c("Arimapdq"),
                                         kt_include.cte = TRUE,
                                         measures = c("SSE"),
                                         trainset1 = 10)
loocv_Spainmales_addit

#Once, we have run the function we can check the result in different ways:
loocv_Spainmales_addit$meas_ages
loocv_Spainmales_addit$meas_periodsfut
loocv_Spainmales_addit$meas_pop
loocv_Spainmales_addit$meas_total


[Package CvmortalityMult version 1.0.3 Index]