recalibrate {recalibratiNN}R Documentation

Obtain recalibrated samples of the predictive distribution.

Description

This function currently offers one recalibration technique, based on the method by Torres R. et al. (2024). It offers two approaches (local and global) to obtain samples and the mean of a recalibrated predictive distribution for any regression Gaussian model that used Mean Squared Error (MSE) as the loss function.

Usage

recalibrate(
  yhat_new,
  pit_values,
  mse,
  space_cal = NULL,
  space_new = NULL,
  type = c("local", "global"),
  p_neighbours = 0.1,
  epsilon = 0
)

Arguments

yhat_new

Predicted values of the new (test) set.

pit_values

Global Probability Integral Transform (PIT) values calculated on the calibration set.

mse

Mean Squared Error of the calibration/validation set. which extremes corresponds to the usual extremes for a 95% confidence interval and the central value corresponds to the median.

space_cal

Used in local recalibration. The covariates/features of the calibration/validation set or any representation of those covariates, such as an intermediate layer or an output layer of a neural network.

space_new

Used in local recalibration. A new set of covariates or other representation of those covariates, provided they are in the same space as the ones in space_cal.

type

Choose between local or global calibration.

p_neighbours

Double between (0,1] that represents the proportion of the x_cal is to be used as the number of neighboors for the KNN. If p_neighbours=1 calibration but weighted by distance. Default is set to 0.1.

epsilon

Approximation for the K-nearest neighbors (KNN) method. Default is epsilon = 0, which returns the exact distance. This parameter is available when choosing local calibration.

Details

The method implemented here is designed to generate recalibrated samples from regression models that have been fitted using the least-squares method. It's important to note that the least-squared method will only render a probabilistic interpretation if the output to be modeled follows a normal distribution, and that assumption was used to implement this method.

The current available methods, draws inspiration from Approximate Bayesian Computation and the Inverse Transform Theory. The calibration methods can be applied either locally or globally. When tipe="global", the calibration will use a uniform kernel.

Alternatively, one can choose the "local" calibration with a p_neighbours=1. This way, the calibration will use the whole calibration set (that is, globally), but instead of an uniform kernel, it will use a Epanechnikov kernel.

Value

A list containing the calibrated predicted mean/variance along with samples from the recalibrated predictive distribution with its respective weights. Weights are calculated with an Epanechnikov kernel. over the distances obtained from KNN.

References

Torres R, Nott DJ, Sisson SA, Rodrigues T, Reis JG, Rodrigues GS (2024). “Model-Free Local Recalibration of Neural Networks.” arXiv preprint arXiv:2403.05756. doi:10.48550/arXiv.2403.05756.

Examples


n <- 1000
split <- 0.8

# Auxiliary functions
mu <- function(x1){
10 + 5*x1^2
}

sigma_v <- function(x1){
30*x1
}

# Generating heteroscedastic data.
x <- runif(n, 1, 10)
y <- rnorm(n, mu(x), sigma_v(x))

# Train set
x_train <- x[1:(n*split)]
y_train <- y[1:(n*split)]

# Calibration/Validation set.
x_cal <- x[(n*split+1):n]
y_cal <- y[(n*split+1):n]

# New observations or the test set.
x_new <- runif(n/5, 1, 10)

# Fitting a simple linear regression, which will not capture the heteroscedasticity
model <- lm(y_train ~ x_train)

y_hat_cal <- predict(model, newdata=data.frame(x_train=x_cal))
MSE_cal <- mean((y_hat_cal - y_cal)^2)

y_hat_new <- predict(model, newdata=data.frame(x_train=x_new))

pit <- PIT_global(ycal=y_cal, yhat= y_hat_cal, mse=MSE_cal)

recalibrate(
  space_cal=x_cal,
  space_new=x_new,
  yhat_new=y_hat_new,
  pit_values=pit,
  mse= MSE_cal,
  type="local")


[Package recalibratiNN version 0.2.0 Index]