R: Get Options for Measuring Performance

get.search.metrics {ldt}

R Documentation

Get Options for Measuring Performance

Description

Use this function to get measuring options in search.? functions.

Usage

get.search.metrics(
  typesIn = c("aic"),
  typesOut = NULL,
  simFixSize = 2,
  trainRatio = 0.75,
  trainFixSize = 0,
  seed = 0,
  horizons = c(1L),
  weightedEval = FALSE,
  minMetrics = list(aic = 0)
)

Arguments

`typesIn`	A list of evaluation metrics when the model is estimated using all available data. It can be `aic`, `sic`, `frequencyCostIn`, `brierIn`, or `aucIn`. `NULL` means no metric.
`typesOut`	A list of evaluation metrics in a out-of-sample simulation. It can be `sign`, `direction`, `rmse`, `rmspe`, `mae`, `mape`, `crps`, `frequencyCostOut`, `brierOut`, or `aucOut`. Null means no metric.
`simFixSize`	An integer that determines the number of out-of-sample simulations. Use zero to disable the simulation.
`trainRatio`	A number representing the size of the training sample relative to the available size, in the out-of-sample simulation. It is effective if `trainFixSize` is zero.
`trainFixSize`	An integer representing the number of data points in the training sample in the out-of-sample simulation. If zero, `trainRatio` will be used.
`seed`	A seed for the random number generator. Use zero for a random value. It can be negative to get reproducible results between the `search.?` function and the `estim.?` function.
`horizons`	An array of integers representing the prediction horizons to be used in out-of-sample simulations, if the model supports time-series prediction. If `NULL`, `c(1)` is used.
`weightedEval`	If `TRUE`, weights are used in evaluating discrete-choice models.
`minMetrics`	a list of minimum values for adjusting the weights when applying the AIC weight formula. It can contain the following members: `aic`, `sic`, `brierIn`, `rmse`, `rmspe`, `mae`, `mape`, `crps`, `brierOut`. Members can be numeric vectors for specifying a value for each target variable. See details.

Details

An important aspect of ldt is model evaluation during the screening process. This involves considering both in-sample and out-of-sample evaluation metrics. In-sample metrics are computed using data that was used in the estimation process, while out-of-sample metrics are computed using new data. These metrics are well documented in the literature, and I will provide an overview of the main computational aspects and relevant references.

Value

A list with the given options.

AIC and SIC

According to Burnham and Anderson (2002) or Greene (2020), AIC and SIC are two commonly used metrics for comparing and choosing among different models with the same endogenous variable(s). Given L^* as the maximum value of the likelihood function in a regression analysis with k estimated parameters and N observations, AIC is calculated by 2k-2\ln L^* and SIC is calculated by k\ln N-2\ln L^*. SIC includes a stronger penalty for increasing the number of estimated parameters in the model.

These metrics can be converted into weights using the formula w=\exp (-0.5x), where x is the value of the metric. When divided by the sum of all weights, w can be interpreted as the probability that a given model is the best model among all members of the model set (see section 2.9 in Burnham and Anderson (2002)). Compared to the Burnham and Anderson (2002) discussion and since f(x)=exp(-0.5x) transformation is invariant to translation, the minimum AIC part is removed in the screening process. This is an important property because it enables the use of running statistics and parallel computation.

MSE, RMSE, MSPE, and RMSPE

According to Hyndman and Athanasopoulos (2018), MSE and RMSE are two commonly used scale-dependent metrics, while MAPE is a commonly used unit-free metric. ldt also calculates the less common RMSPE metric. If there are n predictions and e_i=y_i-\hat{y}_i for i=1\ldots n is the prediction error, i.e., the distance between actual values (y_i) and predictions (\hat{y}_i), these metrics can be expressed analytically by the following formulas:

\mathrm{MAE} = \frac{1}{n}\sum_{i=1}^{n}|e_i|

\mathrm{MAPE} = \frac{1}{n}\sum_{i=1}^{n}\left|\frac{e_i}{y_i}\right|\times 100

\mathrm{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(e_i)^2}

\mathrm{RMSPE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}\left(\frac{e_i}{y_i}\right)^2}\times 100

Note that, first MAPE and RMSPE are not defined if y_i is zero and may not be meaningful or useful if it is near zero or negative. Second, although these metrics cannot be directly interpreted as weights, they are treated in a manner similar to AIC in the ldt package.. Third, caution is required when target variables are transformed, for example to a logarithmic scale. ldt provides an option to transform the data back when calculating these metrics.

Brier

The Brier score measures the accuracy of probabilistic predictions for binary outcomes. It is calculated as the mean squared difference between the actual values (y_i) and the predicted probabilities (p_i). Assuming that there are n predictions, its formula is given by:

\mathrm{Brier} = \frac{\sum (y_i-\hat{p}_i)^2}{n},

where p_i is the predicted probability that the i-th observation is positive. The value of this metric ranges from 0 to 1, with lower values indicating better predictions. In the screening process in ldt, both in-sample and out-of-sample observations can be used to calculate this metric. Although this metric cannot be directly interpreted as a weight, it is treated in a manner similar to AIC.

AUC

As described by Fawcett (2006), the receiver operating characteristic curve (ROC) plots the true positive rate (sensitivity) against the false positive rate (1-specificity) at different classification thresholds. The area under this curve is known as the AUC. Its value ranges from 0 to 1, with higher values indicating that the model is better at distinguishing between the two classes Fawcett (2006, 2006). In the screening process in ldt, both in-sample and out-of-sample observations can be used to calculate this metric. There is also an option to calculate the pessimistic or an instance-varying costs version of this metric. Although this metric does not have a direct interpretation as weights, in ldt its value is considered as weight.

CRPS

According to Gneiting et al. (2005), the continuous ranked probability score (CRPS) is a metric used to measure the accuracy of probabilistic predictions. Unlike MAE, RMSE, etc., CRPS takes into account the entire distribution of the prediction, rather than focusing on a specific point of the probability distribution. For n normally distributed predictions with mean \hat{y}_i and variance \mathrm{var}(\hat{y}_i), this metric can be expressed analytically as:

\mathrm{CRPS}=\sum_{i=1}^{n} \sigma \left(\frac{1}{\sqrt{\pi}} - 2\Phi(z_i) + z_i (2\phi(z_i)-1)\right),

where z_i=(y_i-\hat{y}_i)/\sqrt{\mathrm{var}(\hat{y}_i)}, and \Phi and \phi are CDF and density functions of standard normal distribution. Although this metric cannot be directly interpreted as a weight, it is treated in a manner similar to AIC in the ldt package.

Other metrics

There are some other metrics in ldt. One is “directional prediction accuracy”, which is calculated as the proportion of predictions that correctly predict the direction of change relative to the previous observation. Its value ranges from 0 to 1, with higher values indicating better performance of the model. Its value is used as the weight of a model. Note that this is applicable only to time-series data.

Another similar metric is “sign prediction accuracy”, which reports the proportion of predictions that have the same sign as the actual values. It is calculated as the number of correct sign predictions divided by the total number of predictions. Its value ranges from 0 to 1, with higher values indicating better performance of the model. Its value is used as the weight of a model.

References

Burnham KP, Anderson DR (2002). Model selection and multimodel inference. Springer, New York. ISBN 0387953647, doi:10.1007/b97636.

Fawcett T (2006). “An introduction to ROC analysis.” Pattern Recognition Letters, 27(8), 861–874. doi:10.1016/j.patrec.2005.10.010.

Fawcett T (2006). “ROC graphs with instance-varying costs.” Pattern Recognition Letters, 27(8), 882–891. doi:10.1016/j.patrec.2005.10.012.

Gneiting T, Raftery AE, Westveld AH, Goldman T (2005). “Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation.” Monthly Weather Review, 133(5), 1098–1118. doi:10.1175/mwr2904.1.

Greene WH (2020). Econometric analysis, 8th edition. Pearson Education Limited, New York. ISBN 9781292231136.

Hyndman RJ, Athanasopoulos G (2018). Forecasting: Principles and practice. OTexts. https://otexts.com/fpp2/.

[Package ldt version 0.5.3 Index]