get_pit {SEI}R Documentation

Calculate probability integral transform values

Description

Function to estimate the cumulative distribution function (CDF) from a set of observations, and return the corresponding probability integral transform (PIT) values.

Usage

get_pit(ref_data, new_data, dist = "empirical", return_fit = FALSE)

Arguments

ref_data

numeric vector from which to estimate the CDF.

new_data

numeric vector from which to calculate the PIT values.

dist

string; distribution used to estimate the CDF.

return_fit

logical; return parameters and goodness-of-fit statistics.

Details

dist specifies the distribution used to estimate the cumulative distribution function of the observations. By default, dist = "empirical", in which case the CDF is estimated empirically from the values ref_data. This is only recommended if there are at least 100 values in ref_data, and a warning message is returned otherwise.

Parametric distributions are more appropriate when there is relatively little data, and good reason to expect that the data follows a particular distribution. To check that the chosen parametric distribution is appropriate, the argument return_fit can be used to return the estimated parameters of the distribution, as well as Kolmogorov-Smirnov goodness-of-fit test statistics.

A flexible compromise between using empirical methods and parametric distributions is to use kernel density estimation, dist = "kde".

dist must be one of: 'empirical' (the empirical distribution given data), 'kde' (kernel density estimation), norm', 'lnorm', 'logis', 'llogis', 'exp', 'gamma', and 'weibull'. For the parametric distributions, parameters are estimated using maximum likelihood estimation.

Value

A vector of PIT values if return_fit = F, or, if return_fit = T, a list containing the estimated CDF (F_x), the corresponding parameters (params), and properties of the fit (fit_props).

Author(s)

Sam Allen, Noelia Otero

Examples

N <- 1000
shape <- 3
rate <- 2

x_ref <- rgamma(N, shape, rate)
x_new <- rgamma(N, shape, rate)

# empirical distribution
pit <- get_pit(x_ref, x_new)
hist(pit)

# gamma distribution
pit <- get_pit(x_ref, x_new, dist = "gamma", return_fit = TRUE)
hist(pit$pit)

hist(x_ref, breaks = 30, probability = TRUE)
lines(seq(0, 10, 0.01), dgamma(seq(0, 10, 0.01), pit$params[1], pit$params[2]), col = "blue")


# weibull distribution
pit <- get_pit(x_ref, x_new, dist = "weibull", return_fit = TRUE)
hist(pit$pit)

hist(x_ref, breaks = 30, probability = TRUE)
lines(seq(0, 10, 0.01), dweibull(seq(0, 10, 0.01), pit$params[1], pit$params[2]), col = "blue")


# exponential distribution
pit <- get_pit(x_ref, x_new, dist = "exp", return_fit = TRUE)
hist(pit$pit)

hist(x_ref, breaks = 30, probability = TRUE)
lines(seq(0, 10, 0.01), dexp(seq(0, 10, 0.01), pit$params[1]), col = "blue")



[Package SEI version 0.1.1 Index]