get_pit {SEI} | R Documentation |
Calculate probability integral transform values
Description
Function to estimate the cumulative distribution function (CDF) from a set of observations, and return the corresponding probability integral transform (PIT) values.
Usage
get_pit(ref_data, new_data, dist = "empirical", return_fit = FALSE)
Arguments
ref_data |
numeric vector from which to estimate the CDF. |
new_data |
numeric vector from which to calculate the PIT values. |
dist |
string; distribution used to estimate the CDF. |
return_fit |
logical; return parameters and goodness-of-fit statistics. |
Details
dist
specifies the distribution used to estimate the cumulative distribution
function of the observations. By default, dist = "empirical"
, in which case
the CDF is estimated empirically from the values ref_data
. This is only
recommended if there are at least 100 values in ref_data
, and a warning
message is returned otherwise.
Parametric distributions are more appropriate when there is relatively little data,
and good reason to expect that the data follows a particular distribution. To
check that the chosen parametric distribution is appropriate, the argument
return_fit
can be used to return the estimated parameters of the
distribution, as well as Kolmogorov-Smirnov goodness-of-fit test statistics.
A flexible compromise between using empirical methods and parametric distributions is to
use kernel density estimation, dist = "kde"
.
dist
must be one of: 'empirical' (the empirical distribution given data
),
'kde' (kernel density estimation), norm', 'lnorm', 'logis', 'llogis', 'exp', 'gamma', and 'weibull'.
For the parametric distributions, parameters are estimated using maximum likelihood estimation.
Value
A vector of PIT values if return_fit = F, or, if return_fit = T, a list containing
the estimated CDF (F_x
), the corresponding parameters (params
), and
properties of the fit (fit_props
).
Author(s)
Sam Allen, Noelia Otero
Examples
N <- 1000
shape <- 3
rate <- 2
x_ref <- rgamma(N, shape, rate)
x_new <- rgamma(N, shape, rate)
# empirical distribution
pit <- get_pit(x_ref, x_new)
hist(pit)
# gamma distribution
pit <- get_pit(x_ref, x_new, dist = "gamma", return_fit = TRUE)
hist(pit$pit)
hist(x_ref, breaks = 30, probability = TRUE)
lines(seq(0, 10, 0.01), dgamma(seq(0, 10, 0.01), pit$params[1], pit$params[2]), col = "blue")
# weibull distribution
pit <- get_pit(x_ref, x_new, dist = "weibull", return_fit = TRUE)
hist(pit$pit)
hist(x_ref, breaks = 30, probability = TRUE)
lines(seq(0, 10, 0.01), dweibull(seq(0, 10, 0.01), pit$params[1], pit$params[2]), col = "blue")
# exponential distribution
pit <- get_pit(x_ref, x_new, dist = "exp", return_fit = TRUE)
hist(pit$pit)
hist(x_ref, breaks = 30, probability = TRUE)
lines(seq(0, 10, 0.01), dexp(seq(0, 10, 0.01), pit$params[1]), col = "blue")