R: Function to apply cross-validation techniques for testing the...

multipopulation_cv {CvmortalityMult}

R Documentation

Function to apply cross-validation techniques for testing the forecasting accuracy of multi-population mortality models

Description

R function for testing the accuracy out-of-sample of different multi-population mortality models, Additive (Debon et al., 2011) and Multiplicative (Russolillo et al., 2011). We provide a R function that employ the cross-validation techniques for data panel-time series (Atance et al. 2020) to test the forecasting accuracy. These techniques consist on split the database in two parts: training set (to run the model) and test set (to check the forecasting accuracy of the model). This procedure is repeated several times trying to check the forecasting accuracy in different ways. With this function, the user can provide its own mortality rates for different populations. The function will split the database chronologically (Bergmeir and Benitez, 2012) based on the nahead which consist on the length of the training set. We have include the following Figure 1 to understand how the R function works. Figure: mai.png It should be mentioned that this function is developed for cross-validation the forecasting accuracy of several populations. However, in case you only consider one population, the function will forecast the Lee-Carter model for one population. To test the forecasting accuracy of the selected model, the function provides five different measures: SSE, MSE, MAE, MAPE or All. Depending on how you want to check the forecasting accuracy of the model you could select one or other. In this case, the measures will be obtained using the mortality rates in the normal scale as recommended by Santolino (2023) against the log scale.

Usage

multipopulation_cv(
  qxt,
  model = c("additive", "multiplicative"),
  periods,
  ages,
  nPop,
  lxt = NULL,
  nahead,
  ktmethod = c("Arimapdq", "arima010"),
  kt_include.cte = TRUE,
  measures = c("SSE", "MSE", "MAE", "MAPE", "All")
)

Arguments

`qxt`	mortality rates used to fit the multi-population mortality models. This rates can be provided in matrix or in data.frame.
`model`	multi-population mortality model chosen to fit the mortality rates c("`additive`", "`multiplicative`"). In case you do not provide any value, the function will apply the "`additive`" option.
`periods`	periods considered in the fitting in a vector way c(`minyear`:`maxyear`).
`ages`	vector with the ages considered in the fitting. If the mortality rates provide from an abridged life tables, it is necessary to provide a vector with the ages, see the example.
`nPop`	number of population considered for fitting.
`lxt`	survivor function considered for every population, not necessary to provide.
`nahead`	is a vector specifying the number of periods to block in the blocked CV. The function operates by using the sum of the periods in nahead and three (the minimum number of years required to construct a time series), as the initial training set. This ensures that the first train set has sufficient observations to forecast the initial test set, which will be of length `nahead`.
`ktmethod`	method used to forecast the value of `kt` Arima(p,d,q) or ARIMA(0,1,0); c("`Arimapdq`", "`arima010`").
`kt_include.cte`	if you want that `kt` include constant in the arima process.
`measures`	choose the non-penalized measure of forecasting accuracy that you want to use; c("`SSE`", "`MSE`", "`MAE`", "`MAPE`", "`All`"). Check the function. In case you do not provide any value, the function will apply the "`SSE`" as measure of forecasting accuracy.

Value

An object of the class "MultiCv" including a list() with different components of the cross-validation process:

ax parameter that captures the average shape of the mortality curve in all considered populations.
bx parameter that explains the age effect x with respect to the general trend kt in the mortality rates of all considered populations.
kt.fitted obtained values for the tendency behavior captured by kt .
kt.future future values of kt for every iteration in the cross-validation.
kt.arima the arima selected for each kt time series.
Ii parameter that captures the differences in the pattern of mortality in any region i with respect to Region 1.
formula multi-population mortality formula used to fit the mortality rates.
model provided the model selected in every case.
nPop provided number of populations to fit the periods.
qxt.real real mortality rates.
qxt.future future mortality rates estimated with the multi-population mortality model.
logit.qxt.future future mortality rates in logit way estimated with the multi-population mortality model.
meas_ages measure of forecasting accuracy through the ages of the study.
meas_periodsfut measure of forecasting accuracy in every forecasting period(s) of the study.
meas_pop measure of forecasting accuracy through the populations considered in the study.
meas_total a global measure of forecasting accuracy through the ages, periods and populations of the study.

References

Atance, D., Debon, A., and Navarro, E. (2020). A comparison of forecasting mortality models using resampling methods. Mathematics 8(9): 1550.

Bergmeir, C. & Benitez, J.M. (2012) On the use of cross-validation for time series predictor evaluation. Information Sciences, 191, 192–213.

Debon, A., & Atance, D. (2022). Two multi-population mortality models: A comparison of the forecasting accuracy with resampling methods. in Contributions to Risk Analysis: Risk 2022. Fundacion Mapfre

Debon, A., Montes, F., & Martinez-Ruiz, F. (2011). Statistical methods to compare mortality for a group with non-divergent populations: an application to Spanish regions. European Actuarial Journal, 1, 291-308.

Lee, R.D. & Carter, L.R. (1992). Modeling and forecasting US mortality. Journal of the American Statistical Association, 87(419), 659–671.

Russolillo, M., Giordano, G., & Haberman, S. (2011). Extending the Lee–Carter model: a three-way decomposition. Scandinavian Actuarial Journal, 96-117.

Santolino, M. (2023). Should Selection of the Optimum Stochastic Mortality Model Be Based on the Original or the Logarithmic Scale of the Mortality Rate?. Risks, 11(10), 170.

Examples


#The example takes more than 5 seconds because it includes
#several fitting and forecasting process and hence all
#the process is included in donttest

#We present a cross-validation method for spanish male regions

ages <- c(0, 1, 5, 10, 15, 20, 25, 30, 35, 40,
         45, 50, 55, 60, 65, 70, 75, 80, 85, 90)
library(gnm)
library(forecast)
#Let start with a simple nahead=5 CV method obtaining the SSE forecasting measure of accuracy
cv_Spainmales_addit <- multipopulation_cv(qxt = SpainRegions$qx_male,
                                         model = c("additive"),
                                         periods =  c(1991:2020), ages = c(ages),
                                         nPop = 18, lxt = SpainRegions$lx_male,
                                         nahead = 5,
                                         ktmethod = c("Arimapdq"),
                                         kt_include.cte = TRUE,
                                         measures = c("SSE"))
cv_Spainmales_addit

#Once, we have run the function we can check the result in different ways:
cv_Spainmales_addit$meas_ages
cv_Spainmales_addit$meas_periodsfut
cv_Spainmales_addit$meas_pop
cv_Spainmales_addit$meas_total

[Package CvmortalityMult version 1.0.3 Index]