fitDist {gamlss} | R Documentation |
Fitting Different Parametric gamlss.family
Distributions.
Description
The function fitDist()
is using the function gamlssML()
to fit all relevant parametric gamlss.family
distributions, specified by the argument type
), to a single data vector (with no explanatory variables). The final marginal distribution is the one selected by the generalised Akaike information criterion with penalty k
. The default is k=2
i.e AIC.
The function fitDistPred()
is using the function gamlssMLpred()
to fit all relevant (marginal) parametric gamlss.family
distributions to a single data vector (similar to fitDist()
) but the final model is selected by the minimum prediction global deviance. The user has to specify the training and validation/test samples.
The function chooseDist()
is using the function update.gamlss()
to fit all relevant parametric (conditional) gamlss.family
distributions to a given fitted gamlss
model. The output of the function is a matrix with rows the different distributions (from the argument type
) and columns the different GAIC's (). The default argument for k
are 2, for AIC, 3.84, for Chi square, and log(n) for BIC. No final model is given by the function like for example in fitDist()
. The function getOrder()
can be used to rank the columns of the resulting table (matrix).
The final model can be refitted using update()
, see the examples.
Usage
fitDist(y, k = 2,
type = c("realAll", "realline", "realplus", "real0to1", "counts", "binom"),
try.gamlss = FALSE, extra = NULL, data = NULL,trace = FALSE, ...)
fitDistPred(y,
type = c("realAll", "realline", "realplus", "real0to1", "counts", "binom"),
try.gamlss = FALSE, extra = NULL, data = NULL, rand = NULL,
newdata = NULL, trace = FALSE, ...)
chooseDist(object, k = c(2, 3.84, round(log(length(object$y)), 2)), type =
c("realAll", "realline", "realplus", "real0to1", "counts", "binom","extra"),
extra = NULL, trace = FALSE,
parallel = c("no", "multicore", "snow"), ncpus = 1L, cl = NULL, ...)
chooseDistPred(object, type = c("realAll", "realline", "realplus",
"real0to1", "counts", "binom", "extra"), extra = NULL,
trace = FALSE, parallel = c("no", "multicore", "snow"),
ncpus = 1L, cl = NULL, newdata = NULL, rand = NULL, ...)
getOrder(obj, column = 1)
Arguments
y |
the data vector |
object , obj |
a GAMLSS fitted model |
k |
the penalty for the GAIC with default values |
type |
the type of distribution to be tried see details |
try.gamlss |
this applies to functions |
extra |
whether extra distributions should be tried, which are not in the |
data |
the data frame where |
rand |
For |
newdata |
The prediction data set (validation or test). |
trace |
whether to print during fitting. Note that when |
parallel |
The type of parallel operation to be used (if any). If missing, the default is "no". |
ncpus |
integer: number of processes to be used in parallel operation: typically one would chose this to the number of available CPUs. |
cl |
This is useful for snow clusters, i.e. |
column |
which column of the output matrix to be ordered according to best GAIC |
... |
for extra arguments to be passed to gamlssML() to gamlss() |
Details
The following are the different type
argument:
realAll: All the
gamlss.family
continuous distributions defined on the real line, i.e.realline
and the real positive line i.e.realplus
.realline: The
gamlss.family
continuous distributions : "NO", "GU", "RG" ,"LO", "NET", "TF", "TF2", "PE","PE2", "SN1", "SN2", "exGAUS", "SHASH", "SHASHo","SHASHo2", "EGB2", "JSU", "JSUo", "SEP1", "SEP2", "SEP3", "SEP4", "ST1", "ST2", "ST3", "ST4", "ST5", "SST", "GT".realplus: The
gamlss.family
continuous distributions in the positive real line: "EXP", "GA","IG","LOGNO", "LOGNO2","WEI", "WEI2", "WEI3", "IGAMMA","PARETO2", "PARETO2o", "GP", "BCCG", "BCCGo", "exGAUS", "GG", "GIG", "LNO","BCTo", "BCT", "BCPEo", "BCPE", "GB2".real0to1: The
gamlss.family
continuous distributions from 0 to 1: "BE", "BEo", "BEINF0", "BEINF1", "BEOI", "BEZI", "BEINF", "GB1".counts: The
gamlss.family
distributions for counts: "PO", "GEOM", "GEOMo","LG", "YULE", "ZIPF", "WARING", "GPO", "DPO", "BNB", "NBF","NBI", "NBII", "PIG", "ZIP","ZIP2", "ZAP", "ZALG", "DEL", "ZAZIPF", "SI", "SICHEL","ZANBI", "ZAPIG", "ZINBI", "ZIPIG", "ZINBF", "ZABNB", "ZASICHEL", "ZINBF", "ZIBNB", "ZISICHEL".binom: The
gamlss.family
distributions for binomial type data :"BI", "BB", "DB", "ZIBI", "ZIBB", "ZABI", "ZABB".The function
fitDist()
uses the functiongamlssML()
to fit the different models, the functionfitDistPred()
usesgamlssMLpred()
and the functionchooseDist()
usedupdate.gamlss()
.
Value
For the functions fitDist()
and fitDistPred()
a gamlssML
object is return (the one which minimised the GAIC or VDEV respectively) with two extra components:
fits |
an ordered list according to the GAIC of the fitted distribution |
failed |
the distributions where the |
For the function chooseDist()
a matrix is returned, with rows the different distributions and columns the different GAIC's set by k
.
Author(s)
Mikis Stasinopoulos, Bob Rigby, Vlasis Voudouris and Majid Djennad.
References
Rigby, R. A. and Stasinopoulos D. M. (2005). Generalized additive models for location, scale and shape,(with discussion), Appl. Statist., 54, part 3, pp 507-554.
Rigby, R. A., Stasinopoulos, D. M., Heller, G. Z., and De Bastiani, F. (2019) Distributions for modeling location, scale, and shape: Using GAMLSS in R, Chapman and Hall/CRC. An older version can be found in https://www.gamlss.com/.
Stasinopoulos D. M. Rigby R.A. (2007) Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, Vol. 23, Issue 7, Dec 2007, https://www.jstatsoft.org/v23/i07/.
Stasinopoulos D. M., Rigby R.A., Heller G., Voudouris V., and De Bastiani F., (2017) Flexible Regression and Smoothing: Using GAMLSS in R, Chapman and Hall/CRC.
(see also https://www.gamlss.com/).
See Also
Examples
y <- rt(100, df=1)
m1<-fitDist(y, type="realline")
m1$fits
m1$failed
# an example of using extra
## Not run:
#---------------------------------------
# Example of using the argument extra
library(gamlss.tr)
data(tensile)
gen.trun(par=1,family="GA", type="right")
gen.trun(par=1,"LOGNO", type="right")
gen.trun(par=c(0,1),"TF", type="both")
ma<-fitDist(str, type="real0to1", trace=T,
extra=c("GAtr", "LOGNOtr", "TFtr"),
data=tensile)
ma$fits
ma$failed
#-------------------------------------
# selecting model using the prediction global deviance
# Using fitDistPred
# creating training data
y <- rt(1000, df=2)
m1 <- fitDist(y, type="realline")
m1$fits
m1$fails
# create validation data
yn <- rt(1000, df=2)
# choose distribution which fits the new data best
p1 <- fitDistPred(y, type="realline", newdata=yn)
p1$fits
p1$failed
#---------------------------------------
# using the function chooseDist()
# fitting normal distribution model
m1 <- gamlss(y~pb(x), sigma.fo=~pb(x), family=NO, data=abdom)
# choose a distribution on the real line
# and save GAIC(k=c(2,4,6.4), i.e. AIC, Chi-square and BIC.
t1 <- chooseDist(m1, type="realline", parallel="snow", ncpus=4)
# the GAIC's
t1
# the distributions which failed are with NA's
# ordering according to BIC
getOrder(t1,3)
fm<-update(m1, family=names(getOrder(t1,3)[1]))
## End(Not run)