R: Function to apply Leave-One-Out Cross-Validation (LOOCV)...

multipopulation_loocv {CvmortalityMult}

R Documentation

Function to apply Leave-One-Out Cross-Validation (LOOCV) technique for testing the forecasting accuracy of multi-population mortality models

Description

R function for testing the accuracy out-of-sample of different multi-population mortality models, Additive (Debon et al., 2011) and Multiplicative (Russolillo et al., 2011). We provide a R function that employ the leave-one-out cross-validation technique for data panel-time series (Atance et al. 2020) to test the forecasting accuracy of one-multipopulation mortality model. This technique consists on split the database in two parts: training set (to run the model) and test set (to check the forecasting accuracy of the model) with only one data (one period in this case). This procedure is repeated several times trying to check the forecasting accuracy in different ways enlarging the training set one period-ahead. With this function, the user can provide its own mortality rates for different populations. The function will split the database chronologically (Bergmeir and Benitez, 2012) based on the trainset1 which consist on the length of the first training set. We have include the following Figure 2 to understand how the R function works. Figure: mai.png It should be mentioned that this function is developed for testing the the forecasting accuracy of several populations using leave-one-out cross-validation . However, in case you only consider one population, the function will forecast the Lee-Carter model for one population. To test the forecasting accuracy of the selected model, the function provides five different measures: SSE, MSE, MAE, MAPE or All. Depending on how you want to check the forecasting accuracy of the model you could select one or other. In this case, the measures will be obtained using the mortality rates in the normal scale as recommended by Santolino (2023) against the log scale.

Usage

multipopulation_loocv(
  qxt,
  model = c("additive", "multiplicative"),
  periods,
  ages,
  nPop,
  lxt = NULL,
  ktmethod = c("Arimapdq", "arima010"),
  kt_include.cte = TRUE,
  measures = c("SSE", "MSE", "MAE", "MAPE", "All"),
  trainset1
)

Arguments

`qxt`	mortality rates used to fit the multi-population mortality models. This rates can be provided in matrix or in data.frame.
`model`	multi-population mortality model chosen to fit the mortality rates c("`additive`", "`multiplicative`"). In case you do not provide any value, the function will apply the "`additive`" option.
`periods`	periods considered in the fitting in a vector way c(minyear:maxyear).
`ages`	vector with the ages considered in the fitting. If the mortality rates provide from an abridged life tables, it is necessary to provide a vector with the ages, see the example.
`nPop`	number of population considered for fitting.
`lxt`	survivor function considered for every population, not necessary to provide.
`ktmethod`	method used to forecast the value of `kt` Arima(p,d,q) or ARIMA(0,1,0); c("`Arimapdq`", "`arima010`").
`kt_include.cte`	if you want that `kt` include constant in the arima process.
`measures`	choose the non-penalized measure of forecasting accuracy that you want to use; c("`SSE`", "`MSE`", "`MAE`", "`MAPE`", "`All`"). Check the function. In case you do not provide any value, the function will apply the "`SSE`" as measure of forecasting accuracy.
`trainset1`	vector with the periods for the first training set. This value must be greater than 2 to meet the minimum time series size (Hyndman and Khandakar, 2008).

Value

A list with class "MultiCv" including different components of the cross-validation process:

ax parameter that captures the average shape of the mortality curve in all considered populations.
bx parameter that explains the age effect x with respect to the general trend kt in the mortality rates of all considered populations.
kt.fitted obtained values for the tendency behavior captured by kt.
kt.future future values of kt for every iteration in the cross-validation.
kt.arimathe arima selected for each kt time series.
Ii parameter that captures the differences in the pattern of mortality in any region i with respect to Region 1.
formula multi-population mortality formula used to fit the mortality rates.
nPop provided number of populations to fit the periods.
qxt.real real mortality rates.
qxt.future future mortality rates estimated with the multi-population mortality model.
logit.qxt.future future mortality rates in logit way estimated with the multi-population mortality model.
meas_ages measure of forecasting accuracy through the ages of the study.
meas_periodsfut measure of forecasting accuracy in every forecasting period(s) of the study.
meas_pop measure of forecasting accuracy through the populations considered in the study.
meas_total a global measure of forecasting accuracy through the ages, periods and populations of the study.

References

Atance, D., Debon, A., and Navarro, E. (2020). A comparison of forecasting mortality models using resampling methods. Mathematics 8(9): 1550.

Bergmeir, C. & Benitez, J.M. (2012) On the use of cross-validation for time series predictor evaluation. Information Sciences, 191, 192–213.

Debon, A., & Atance, D. (2022). Two multi-population mortality models: A comparison of the forecasting accuracy with resampling methods. in Contributions to Risk Analysis: Risk 2022. Fundacion Mapfre

Debon, A., Montes, F., & Martinez-Ruiz, F. (2011). Statistical methods to compare mortality for a group with non-divergent populations: an application to Spanish regions. European Actuarial Journal, 1, 291-308.

Hyndman, R.J. & Khandakar, Y. (2008). Automatic time series forecasting: The forecast package for R. Journal of Statistical. Software, 26, 1–22.

Lee, R.D. & Carter, L.R. (1992). Modeling and forecasting US mortality. Journal of the American Statistical Association, 87(419), 659–671.

Russolillo, M., Giordano, G., & Haberman, S. (2011). Extending the Lee–Carter model: a three-way decomposition. Scandinavian Actuarial Journal, 96-117.

Santolino, M. (2023). Should Selection of the Optimum Stochastic Mortality Model Be Based on the Original or the Logarithmic Scale of the Mortality Rate?. Risks, 11(10), 170.

Examples

#The example takes more than 5 seconds because it includes
#several fitting and forecasting process and hence all
#the process is included in donttest

#We present the leave-one-out cross-validation (LOOCV) method for spanish male regions
#The idea is to get the same results as in the short paper published in Risk Congress 2023
SpainRegions
ages <- c(0, 1, 5, 10, 15, 20, 25, 30, 35, 40,
         45, 50, 55, 60, 65, 70, 75, 80, 85, 90)
library(gnm)
library(forecast)
#Let start with a simple trainset1 = 10 CV method obtaining the SSE forecasting measure of accuracy
loocv_Spainmales_addit <- multipopulation_loocv(qxt = SpainRegions$qx_male,
                                         model = c("additive"),
                                         periods =  c(1991:2020), ages = c(ages),
                                         nPop = 18, lxt = SpainRegions$lx_male,
                                         ktmethod = c("Arimapdq"),
                                         kt_include.cte = TRUE,
                                         measures = c("SSE"),
                                         trainset1 = 10)
loocv_Spainmales_addit

#Once, we have run the function we can check the result in different ways:
loocv_Spainmales_addit$meas_ages
loocv_Spainmales_addit$meas_periodsfut
loocv_Spainmales_addit$meas_pop
loocv_Spainmales_addit$meas_total

[Package CvmortalityMult version 1.0.3 Index]