gamlssVGD {gamlss} | R Documentation |
A Set of Functions for selecting Models using Validation or Test Data Sets and Cross Validation
Description
This is a set of function useful for selecting appropriate models.
The functions gamlssVGD
, VGD
, getTGD
, TGD
can be used when a subset of the data is used for validation or testing.
The function stepVGD()
is a stepwise procedure for selecting an appropriate model for any of the parameters of the model minimising the test global deviance. The function stepVGDAll.A()
can select a model using strategy A for all the parameters.
The functions gamlssCV
, CV
can be used for a k-fold cross validation.
Usage
gamlssVGD(formula = NULL, sigma.formula = ~1, nu.formula = ~1,
tau.formula = ~1, data = NULL, family = NO,
control = gamlss.control(trace = FALSE),
rand = NULL, newdata = NULL, ...)
VGD(object, ...)
getTGD(object, newdata = NULL, ...)
TGD(object, ...)
gamlssCV(formula = NULL, sigma.formula = ~1, nu.formula = ~1,
tau.formula = ~1, data = NULL, family = NO,
control = gamlss.control(trace = FALSE),
K.fold = 10, set.seed = 123, rand = NULL,
parallel = c("no", "multicore", "snow"),
ncpus = 1L, cl = NULL, ...)
CV(object, ...)
drop1TGD(object, scope, newdata, parameter = c("mu", "sigma", "nu", "tau"),
sorted = FALSE, trace = FALSE,
parallel = c("no", "multicore", "snow"),
ncpus = 1L, cl = NULL, ...)
add1TGD(object, scope, newdata, parameter = c("mu", "sigma", "nu", "tau"),
sorted = FALSE, trace = FALSE,
parallel = c("no", "multicore", "snow"),
ncpus = 1L, cl = NULL, ...)
stepTGD(object, scope, newdata,
direction = c("both", "backward", "forward"),
trace = TRUE, keep = NULL, steps = 1000,
parameter = c("mu", "sigma", "nu", "tau"),
parallel = c("no", "multicore", "snow"),
ncpus = 1L, cl = NULL, ...)
stepTGDAll.A(object, scope = NULL, newdata = NULL,
steps = 1000, sigma.scope = NULL, nu.scope = NULL,
tau.scope = NULL, mu.try = TRUE, sigma.try = TRUE,
nu.try = TRUE, tau.try = TRUE,
parallel = c("no", "multicore", "snow"),
ncpus = 1L, cl = NULL, ...)
Arguments
formula |
A |
sigma.formula |
Formula for |
nu.formula |
Formula for |
tau.formula |
Formula for |
data |
The data frame required for the fit. |
family |
The |
control |
The control for fitting the gamlss model. |
rand |
For |
newdata |
The new data set (validation or test) for prediction. |
object |
A relevant R object. |
scope |
defines the range of models examined in the stepwise selection similar to |
sigma.scope |
defines the range of models examined in the stepwise selection for |
nu.scope |
defines the range of models examined in the stepwise selection for |
tau.scope |
defines the range of models examined in the stepwise selection for |
mu.try |
whether should try fitting models for |
sigma.try |
whether should try fitting models for |
nu.try |
whether should try fitting models for |
tau.try |
whether should try fitting models for |
parameter |
which distribution parameter is required, default |
sorted |
should the results be sorted on the value of TGD |
trace |
f |
direction |
The mode of stepwise search, can be one of |
keep |
see |
steps |
the maximum number of steps to be considered. The default is 1000. |
K.fold |
the number of subsets of the data used |
set.seed |
the seed to be used in creating |
parallel |
The type of parallel operation to be used (if any). If missing, the default is "no". |
ncpus |
integer: number of processes to be used in parallel operation: typically one would chose this to the number of available CPUs. |
cl |
An optional parallel or snow cluster for use if |
... |
further arguments to be pass in the gamlss fit |
Details
The function gamlssVGD()
fits a gamlss model to the training data set determined by the arguments rand
or newdata
. The results is a gamlssVGD
objects which contains the gamlss fit to the training data plus three extra components: i) VGD
the global deviance applied to the validation data sets. ii) predictError
which is VGD
divided with the number of observations in the validation data set and iii) residVal
the residuals for the validation data set.
The function VGD()
extract the validated global deviance from one or more fitted gamlssVGD
objects and can be used foe model comparison.
The function getTGD()
operates different from the function gamlssVGD()
. It assumes that the users already have fitted models using gamlss()
and now he/she wants to evaluate the global deviance at a new (validation or test) data set.
The function TGD()
extract the validated/test global deviance from one or more fitted gamlssTGD
objects and can be use to compare models.
The gamlssCV()
performs a k-fold cross validation on a gamlss models.
The function CV()
extract the cross validated global deviance from one or more fitted gamlssCV
objects and can be use to compare models.
The functions add1TGD()
, drop1TGD()
and stepTGD
behave similar to add1()
, drop1()
and stepGAIC()
functions respectively but they used validation or test deviance as the selection criterion rather than the GAIC.
Value
A fitted models of a set of global deviances.
Author(s)
Mikis Stasinopoulos
References
Chambers, J. M. and Hastie, T. J. (1991). Statistical Models in S, Chapman and Hall, London.
Rigby, R. A. and Stasinopoulos D. M. (2005). Generalized additive models for location, scale and shape,(with discussion), Appl. Statist., 54, part 3, pp 507-554.
Rigby, R. A., Stasinopoulos, D. M., Heller, G. Z., and De Bastiani, F. (2019) Distributions for modeling location, scale, and shape: Using GAMLSS in R, Chapman and Hall/CRC. An older version can be found in https://www.gamlss.com/.
Stasinopoulos D. M. Rigby R.A. (2007) Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, Vol. 23, Issue 7, Dec 2007, https://www.jstatsoft.org/v23/i07/.
Stasinopoulos D. M., Rigby R.A., Heller G., Voudouris V., and De Bastiani F., (2017) Flexible Regression and Smoothing: Using GAMLSS in R, Chapman and Hall/CRC.
(see also https://www.gamlss.com/).
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
data(abdom)
# generate the random split of the data
rand <- sample(2, 610, replace=TRUE, prob=c(0.6,0.4))
# the proportions in the sample
table(rand)/610
olddata<-abdom[rand==1,] # training data
newdata<-abdom[rand==2,] # validation data
#------------------------------------------------------------------------------
# gamlssVGD
#-------------------------------------------------------------------------------
# Using rand
v1 <- gamlssVGD(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=abdom, family=NO,
rand=rand)
v2 <- gamlssVGD(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=abdom, family=LO,
rand=rand)
v3 <- gamlssVGD(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=abdom, family=TF,
rand=rand)
VGD(v1,v2,v3)
#-------------------------------------------------------------------------------
## Not run:
#-------------------------------------------------------------------------------
# using two data set
v11 <- gamlssVGD(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=olddata,
family=NO, newdata=newdata)
v12 <- gamlssVGD(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=olddata,
family=LO, newdata=newdata)
v13 <- gamlssVGD(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=olddata,
family=TF, newdata=newdata)
VGD(v11,v12,v13)
#-------------------------------------------------------------------------------
# function getTGD
#-------------------------------------------------------------------------------
# fit gamlss models first
g1 <- gamlss(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=olddata, family=NO)
g2 <- gamlss(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=olddata, family=LO)
g3 <- gamlss(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=olddata, family=TF)
# and then use
gg1 <-getTGD(g1, newdata=newdata)
gg2 <-getTGD(g2, newdata=newdata)
gg3 <-getTGD(g3, newdata=newdata)
TGD(gg1,gg2,gg3)
#-------------------------------------------------------------------------------
#-------------------------------------------------------------------------------
# function gamlssCV
#-------------------------------------------------------------------------------
set.seed(123)
rand1 <- sample (10 , 610, replace=TRUE)
g1 <- gamlssCV(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=abdom, family=NO,
rand=rand1)
g2 <- gamlssCV(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=abdom, family=LO,
rand=rand1)
g3 <- gamlssCV(y~pb(x,df=2),sigma.formula=~pb(x,df=1), data=abdom, family=TF,
rand=rand1)
CV(g1,g2,g3)
CV(g1)
# using parallel
set.seed(123)
rand1 <- sample (10 , 610, replace=TRUE)
nC <- detectCores()
system.time(g21 <- gamlssCV(y~pb(x,df=2), sigma.formula=~pb(x,df=1), data=abdom,
family=NO, rand=rand1,parallel = "no", ncpus = nC ))
system.time(g22 <- gamlssCV(y~pb(x,df=2), sigma.formula=~pb(x,df=1), data=abdom,
family=LO, rand=rand1,parallel = "multicore", ncpus = nC ))
system.time(g23 <- gamlssCV(y~pb(x,df=2), sigma.formula=~pb(x,df=1), data=abdom,
family=TF, rand=rand1,parallel = "snow", ncpus = nC ))
CV(g21,g22,g23)
#-------------------------------------------------------------------------------
# functions add1TGD() drop1TGD() and stepTGD()
#-------------------------------------------------------------------------------
# the data
data(rent)
rand <- sample(2, dim(rent)[1], replace=TRUE, prob=c(0.6,0.4))
# the proportions in the sample
table(rand)/dim(rent)[1]
oldrent<-rent[rand==1,] # training set
newrent<-rent[rand==2,] # validation set
# null model
v0 <- gamlss(R~1, data=oldrent, family=GA)
# complete model
v1 <- gamlss(R~pb(Fl)+pb(A)+H+loc, sigma.fo=~pb(Fl)+pb(A)+H+loc,
data=oldrent, family=GA)
# drop1TGDP
system.time(v3<- drop1TGD(v1, newdata=newrent, parallel="no"))
system.time(v4<- drop1TGD(v1, newdata=newrent, parallel="multicore",
ncpus=nC) )
system.time(v5<- drop1TGD(v1, newdata=newrent, parallel="snow", ncpus=nC))
cbind(v3,v4,v5)
# add1TGDP
system.time(d3<- add1TGD(v0,scope=~pb(Fl)+pb(A)+H+loc, newdata=newrent,
parallel="no"))
system.time(d4<- add1TGD(v0,scope=~pb(Fl)+pb(A)+H+loc, newdata=newrent,
parallel="multicore", ncpus=nC) )
system.time(d5<- add1TGD(v0, scope=~pb(Fl)+pb(A)+H+loc,newdata=newrent,
parallel="snow", ncpus=nC))
# stepTGD
system.time(d6<- stepTGD(v0, scope=~pb(Fl)+pb(A)+H+loc,newdata=newrent))
system.time(d7<- stepTGD(v0, scope=~pb(Fl)+pb(A)+H+loc,newdata=newrent,
parallel="multicore", ncpus=nC))
system.time(d8<- stepTGD(v0, scope=~pb(Fl)+pb(A)+H+loc,newdata=newrent,
parallel="snow", ncpus=nC))
## End(Not run)