fit_dist {SEI} | R Documentation |
Fit a distribution to data
Description
Function to fit a specified distribution a vector of data. Returns the estimated distribution and relevant goodness-of-fit statistics.
Usage
fit_dist(data, dist, n_thres = 20)
Arguments
data |
vector of data |
dist |
character string specifying the distribution, see details |
n_thres |
number of data points required to estimate the distribution |
Details
This has been adapted from code available at https://github.com/WillemMaetens/standaRdized.
data
is a numeric vector of data from which the distribution is to be estimated.
dist
is the specified distribution to be fit to data
. This must be one of
'empirical' (the empirical distribution given data
), 'kde' (kernel density estimation),
'norm', 'lnorm', 'logis', 'llogis', 'exp', 'gamma', and 'weibull'.
By default, dist = "empirical"
, in which case
the distribution is estimated empirically from data
. This is only
recommended if there are at least 100 values in data
, and a warning
message is returned otherwise.
n_thres
is the minimum number of observations required to fit the distribution.
The default is n_thres = 20
. If the number of values in data
is
smaller than na_thres
, an error is returned. This guards against over-fitting,
which can result in distributions that do not generalise well out-of-sample.
Where relevant, parameter estimation is performed using maximum likelihood estimation.
Value
A list containing the estimated distribution function, its parameters, and Kolmogorov-Smirnov goodness-of-fit statistics.
Examples
N <- 1000
shape <- 3
rate <- 2
# gamma distribution
data <- rgamma(N, shape, rate)
out <- fit_dist(data, dist = "gamma")
hist(data, breaks = 30, probability = TRUE)
lines(seq(0, 10, 0.01), dgamma(seq(0, 10, 0.01), out$params[1], out$params[2]), col = "blue")
# weibull distribution
data <- rweibull(N, shape, 1/rate)
out <- fit_dist(data, dist = "weibull")
hist(data, breaks = 30, probability = TRUE)
lines(seq(0, 10, 0.01), dweibull(seq(0, 10, 0.01), out$params[1], out$params[2]), col = "blue")