multipopulation_cv {CvmortalityMult} | R Documentation |
Function to apply cross-validation techniques for testing the forecasting accuracy of multi-population mortality models
Description
R function for testing the accuracy out-of-sample of different multi-population mortality models, Additive (Debon et al., 2011) and Multiplicative (Russolillo et al., 2011). We provide a R function that employ the cross-validation techniques for data panel-time series (Atance et al. 2020) to test the forecasting accuracy. These techniques consist on split the database in two parts: training set (to run the model) and test set (to check the forecasting accuracy of the model). This procedure is repeated several times trying to check the forecasting accuracy in different ways. With this function, the user can provide its own mortality rates for different populations. The function will split the database chronologically (Bergmeir and Benitez, 2012) based on the nahead which consist on the length of the training set. We have include the following Figure 1 to understand how the R function works. It should be mentioned that this function is developed for cross-validation the forecasting accuracy of several populations. However, in case you only consider one population, the function will forecast the Lee-Carter model for one population. To test the forecasting accuracy of the selected model, the function provides five different measures: SSE, MSE, MAE, MAPE or All. Depending on how you want to check the forecasting accuracy of the model you could select one or other. In this case, the measures will be obtained using the mortality rates in the normal scale as recommended by Santolino (2023) against the log scale.
Usage
multipopulation_cv(
qxt,
model = c("additive", "multiplicative"),
periods,
ages,
nPop,
lxt = NULL,
nahead,
ktmethod = c("Arimapdq", "arima010"),
kt_include.cte = TRUE,
measures = c("SSE", "MSE", "MAE", "MAPE", "All")
)
Arguments
qxt |
mortality rates used to fit the multi-population mortality models. This rates can be provided in matrix or in data.frame. |
model |
multi-population mortality model chosen to fit the mortality rates c(" |
periods |
periods considered in the fitting in a vector way c( |
ages |
vector with the ages considered in the fitting. If the mortality rates provide from an abridged life tables, it is necessary to provide a vector with the ages, see the example. |
nPop |
number of population considered for fitting. |
lxt |
survivor function considered for every population, not necessary to provide. |
nahead |
is a vector specifying the number of periods to block in the blocked CV. The function operates by using the sum of the periods in nahead and three (the minimum number of years required to construct a time series), as the initial training set. This ensures that the first train set has sufficient observations to forecast the initial test set, which will be of length |
ktmethod |
method used to forecast the value of |
kt_include.cte |
if you want that |
measures |
choose the non-penalized measure of forecasting accuracy that you want to use; c(" |
Value
An object of the class "MultiCv"
including a list()
with different components of the cross-validation process:
-
ax
parameter that captures the average shape of the mortality curve in all considered populations. -
bx
parameter that explains the age effect x with respect to the general trendkt
in the mortality rates of all considered populations. -
kt.fitted
obtained values for the tendency behavior captured bykt
. -
kt.future
future values ofkt
for every iteration in the cross-validation. -
kt.arima
the arima selected for eachkt
time series. -
Ii
parameter that captures the differences in the pattern of mortality in any region i with respect to Region 1. -
formula
multi-population mortality formula used to fit the mortality rates. -
model
provided the model selected in every case. -
nPop
provided number of populations to fit the periods. -
qxt.real
real mortality rates. -
qxt.future
future mortality rates estimated with the multi-population mortality model. -
logit.qxt.future
future mortality rates in logit way estimated with the multi-population mortality model. -
meas_ages
measure of forecasting accuracy through the ages of the study. -
meas_periodsfut
measure of forecasting accuracy in every forecasting period(s) of the study. -
meas_pop
measure of forecasting accuracy through the populations considered in the study. -
meas_total
a global measure of forecasting accuracy through the ages, periods and populations of the study.
References
Atance, D., Debon, A., and Navarro, E. (2020). A comparison of forecasting mortality models using resampling methods. Mathematics 8(9): 1550.
Bergmeir, C. & Benitez, J.M. (2012) On the use of cross-validation for time series predictor evaluation. Information Sciences, 191, 192–213.
Debon, A., & Atance, D. (2022). Two multi-population mortality models: A comparison of the forecasting accuracy with resampling methods. in Contributions to Risk Analysis: Risk 2022. Fundacion Mapfre
Debon, A., Montes, F., & Martinez-Ruiz, F. (2011). Statistical methods to compare mortality for a group with non-divergent populations: an application to Spanish regions. European Actuarial Journal, 1, 291-308.
Lee, R.D. & Carter, L.R. (1992). Modeling and forecasting US mortality. Journal of the American Statistical Association, 87(419), 659–671.
Russolillo, M., Giordano, G., & Haberman, S. (2011). Extending the Lee–Carter model: a three-way decomposition. Scandinavian Actuarial Journal, 96-117.
Santolino, M. (2023). Should Selection of the Optimum Stochastic Mortality Model Be Based on the Original or the Logarithmic Scale of the Mortality Rate?. Risks, 11(10), 170.
See Also
multipopulation_loocv
,
fitLCmulti
, forecast.fitLCmulti
,
plot.fitLCmulti
, plot.forLCmulti
,
MeasureAccuracy
.
Examples
#The example takes more than 5 seconds because it includes
#several fitting and forecasting process and hence all
#the process is included in donttest
#We present a cross-validation method for spanish male regions
ages <- c(0, 1, 5, 10, 15, 20, 25, 30, 35, 40,
45, 50, 55, 60, 65, 70, 75, 80, 85, 90)
library(gnm)
library(forecast)
#Let start with a simple nahead=5 CV method obtaining the SSE forecasting measure of accuracy
cv_Spainmales_addit <- multipopulation_cv(qxt = SpainRegions$qx_male,
model = c("additive"),
periods = c(1991:2020), ages = c(ages),
nPop = 18, lxt = SpainRegions$lx_male,
nahead = 5,
ktmethod = c("Arimapdq"),
kt_include.cte = TRUE,
measures = c("SSE"))
cv_Spainmales_addit
#Once, we have run the function we can check the result in different ways:
cv_Spainmales_addit$meas_ages
cv_Spainmales_addit$meas_periodsfut
cv_Spainmales_addit$meas_pop
cv_Spainmales_addit$meas_total