mgedist {fitdistrplus} | R Documentation |
Maximum goodness-of-fit fit of univariate continuous distributions
Description
Fit of univariate continuous distribution by maximizing goodness-of-fit (or minimizing distance) for non censored data.
Usage
mgedist(data, distr, gof = "CvM", start = NULL, fix.arg = NULL, optim.method = "default",
lower = -Inf, upper = Inf, custom.optim = NULL, silent = TRUE, gradient = NULL,
checkstartfix=FALSE, calcvcov=FALSE, ...)
Arguments
data |
A numeric vector for non censored data. |
distr |
A character string |
gof |
A character string coding for the name of the goodness-of-fit distance used :
|
start |
A named list giving the initial values of parameters of the named distribution
or a function of data computing initial values and returning a named list.
This argument may be omitted (default) for some distributions for which reasonable
starting values are computed (see the 'details' section of |
fix.arg |
An optional named list giving the values of fixed parameters of the named distribution or a function of data computing (fixed) parameter values and returning a named list. Parameters with fixed value are thus NOT estimated. |
optim.method |
|
lower |
Left bounds on the parameters for the |
upper |
Right bounds on the parameters for the |
custom.optim |
a function carrying the optimization. |
silent |
A logical to remove or show warnings when bootstraping. |
gradient |
A function to return the gradient of the gof distance for the |
checkstartfix |
A logical to test starting and fixed values. Do not change it. |
calcvcov |
A logical indicating if (asymptotic) covariance matrix is required. (currently ignored) |
... |
further arguments passed to the |
Details
The mgedist
function numerically maximizes goodness-of-fit,
or minimizes a goodness-of-fit distance coded by the argument
gof
. One may use one of the classical distances defined in Stephens (1986),
the Cramer-von Mises distance ("CvM"
), the
Kolmogorov-Smirnov distance ("KS"
) or the Anderson-Darling distance ("AD"
)
which gives more weight to the tails of the distribution,
or one of the variants of this last distance proposed by Luceno (2006). The right-tail AD ("ADR"
)
gives more weight only to the right tail, the left-tail AD ("ADL"
)
gives more weight only to the left tail. Either of the tails, or both of them, can receive even larger
weights by using second order Anderson-Darling Statistics (using "AD2R"
, "AD2L"
or "AD2"
).
The optimization process is the same as mledist
, see the 'details' section
of that function.
This function is not intended to be called directly but is internally called in
fitdist
and bootdist
.
This function is intended to be used only with continuous distributions and weighted maximum goodness-of-fit estimation is not allowed.
NB: if your data values are particularly small or large, a scaling may be needed before the optimization process. See example (4).
Value
mgedist
returns a list with following components,
estimate |
the parameter estimates. |
convergence |
an integer code for the convergence of |
value |
the minimal value reached for the criterion to minimize. |
hessian |
a symmetric matrix computed by |
optim.function |
the name of the optimization function used for maximum likelihood. |
optim.method |
when |
fix.arg |
the named list giving the values of parameters of the named distribution
that must kept fixed rather than estimated by maximum likelihood or |
fix.arg.fun |
the function used to set the value of |
weights |
the vector of weigths used in the estimation process or |
counts |
A two-element integer vector giving the number of calls
to the log-likelihood function and its gradient respectively.
This excludes those calls needed to compute the Hessian, if requested,
and any calls to log-likelihood function to compute a finite-difference
approximation to the gradient. |
optim.message |
A character string giving any additional information
returned by the optimizer, or |
loglik |
the log-likelihood value. |
gof |
the code of the goodness-of-fit distance maximized. |
Author(s)
Marie-Laure Delignette-Muller and Christophe Dutang.
References
Luceno A (2006), Fitting the generalized Pareto distribution to data using maximum goodness-of-fit estimators. Computational Statistics and Data Analysis, 51, 904-917, doi:10.1016/j.csda.2005.09.011.
Stephens MA (1986), Tests based on edf statistics. In Goodness-of-fit techniques (D'Agostino RB and Stephens MA, eds), Marcel Dekker, New York, pp. 97-194.
Delignette-Muller ML and Dutang C (2015), fitdistrplus: An R Package for Fitting Distributions. Journal of Statistical Software, 64(4), 1-34, doi:10.18637/jss.v064.i04.
See Also
mmedist
, mledist
, qmedist
,
fitdist
for other estimation methods.
Examples
# (1) Fit of a Weibull distribution to serving size data by maximum
# goodness-of-fit estimation using all the distances available
#
data(groundbeef)
serving <- groundbeef$serving
mgedist(serving, "weibull", gof="CvM")
mgedist(serving, "weibull", gof="KS")
mgedist(serving, "weibull", gof="AD")
mgedist(serving, "weibull", gof="ADR")
mgedist(serving, "weibull", gof="ADL")
mgedist(serving, "weibull", gof="AD2R")
mgedist(serving, "weibull", gof="AD2L")
mgedist(serving, "weibull", gof="AD2")
# (2) Fit of a uniform distribution using Cramer-von Mises or
# Kolmogorov-Smirnov distance
#
set.seed(1234)
u <- runif(100,min=5,max=10)
mgedist(u,"unif",gof="CvM")
mgedist(u,"unif",gof="KS")
# (3) Fit of a triangular distribution using Cramer-von Mises or
# Kolmogorov-Smirnov distance
#
require(mc2d)
set.seed(1234)
t <- rtriang(100,min=5,mode=6,max=10)
mgedist(t,"triang",start = list(min=4, mode=6,max=9),gof="CvM")
mgedist(t,"triang",start = list(min=4, mode=6,max=9),gof="KS")
# (4) scaling problem
# the simulated dataset (below) has particularly small values, hence without scaling (10^0),
# the optimization raises an error. The for loop shows how scaling by 10^i
# for i=1,...,6 makes the fitting procedure work correctly.
set.seed(1234)
x2 <- rnorm(100, 1e-4, 2e-4)
for(i in 6:0)
cat(i, try(mgedist(x*10^i,"cauchy")$estimate, silent=TRUE), "\n")