stepGAIC {gamlss} | R Documentation |
Choose a model by GAIC in a Stepwise Algorithm
Description
The function stepGAIC()
performs stepwise model selection using a Generalized Akaike Information Criterion (GAIC). It is based on the function stepAIC()
given in the library MASS of Venables and Ripley (2002). The function has been changed recently to allow parallel computation. The parallel computations are similar to the ones performed in the function boot()
of the boot package. Note that since version 4.3-5 of gamlss the stepGAIC()
do not have the option of using the function stepGAIC.CH
through the argument additive
.
Note that stepGAIC()
is relying to the dropterm()
and addterm()
methods applied to gamlss objects. drop1()
and add1()
are equivalent methods to the dropterm()
and addterm()
respectively but with different default arguments (see the examples).
The function stepGAIC.VR()
is the old version of stepGAIC()
with no parallel computations.
The function stepGAIC.CH
is based on the S function step.gam()
(see Chambers and Hastie (1991)) and it is more suited for model with smoothing additive terms when the degrees of freedom for smoothing are fixed in advance. This is something which rarely used these days, as most of the smoothing functions allow the calculations of the smoothing parameter, see for example the additive function pb()
).
The functions stepGAIC.VR()
and stepGAIC.CH()
have been adapted to work with gamlss objects and the main difference is the scope
argument, see below.
While the functions stepGAIC()
is used to build models for individual parameters of the distribution of the response variable, the functions stepGAICAll.A()
and stepGAICAll.A()
are building models for all the parameters.
The functions stepGAICAll.A()
and stepGAICAll.B()
are based on the stepGAIC()
function but use different strategies for selecting a appropriate final model.
stepGAICAll.A()
has the following strategy:
Strategy A:
i) build a model for mu
using a forward approach.
ii) given the model for mu
build a model for sigma
(forward)
iii) given the models for mu
and sigma
build a model for nu
(forward)
iv) given the models for mu
, sigma
and nu
build a model for tau
(forward)
v) given the models for mu
, sigma
, nu
and tau
check whether the terms for nu
are needed using backward elimination.
vi) given the models for mu
, sigma
, nu
and tau
check whether the terms for sigma
are needed (backward).
vii) given the models for mu
, sigma
, nu
and tau
check whether the terms for mu
are needed (backward).
Note for this strategy to work the scope
argument should be set appropriately.
stepGAICAll.B()
uses the same procedure as the function stepGAIC()
but each term in the scope is fitted to all the parameters of the distribution, rather than the one specified by the argument what
of stepGAIC()
.
The stepGAICAll.B()
relies on the add1All()
and drop1All()
functions for the selection of variables.
Usage
stepGAIC(object, scope, direction = c("both", "backward", "forward"),
trace = TRUE, keep = NULL, steps = 1000, scale = 0,
what = c("mu", "sigma", "nu", "tau"), parameter= NULL, k = 2,
parallel = c("no", "multicore", "snow"), ncpus = 1L, cl = NULL,
...)
stepGAIC.VR(object, scope, direction = c("both", "backward", "forward"),
trace = TRUE, keep = NULL, steps = 1000, scale = 0,
what = c("mu", "sigma", "nu", "tau"), parameter= NULL, k = 2,
...)
stepGAIC.CH(object, scope = gamlss.scope(model.frame(object)),
direction = c("both", "backward", "forward"), trace = TRUE,
keep = NULL, steps = 1000, what = c("mu", "sigma", "nu", "tau"),
parameter= NULL, k = 2, ...)
stepGAICAll.A(object, scope = NULL, sigma.scope = NULL, nu.scope = NULL,
tau.scope = NULL, mu.try = TRUE, sigma.try = TRUE,
nu.try = TRUE, tau.try = TRUE, direction = NULL,
parallel = c("no", "multicore", "snow"), ncpus = 1L,
cl = NULL, ...)
stepGAICAll.B(object, scope, direction = c("both", "backward", "forward"),
trace = T, keep = NULL, steps = 1000, scale = 0, k = 2,
parallel = c("no", "multicore", "snow"), ncpus = 1L,
cl = NULL, ...)
drop1All(object, scope, test = c("Chisq", "none"), k = 2, sorted = FALSE,
trace = FALSE, parallel = c("no", "multicore", "snow"),
ncpus = 1L, cl = NULL, ...)
add1All(object, scope, test = c("Chisq", "none"), k = 2, sorted = FALSE,
trace = FALSE, parallel = c("no", "multicore", "snow"),
ncpus = 1L, cl = NULL, ...)
Arguments
object |
an gamlss object. This is used as the initial model in the stepwise search. |
scope |
defines the range of models examined in the stepwise search.
For the function |
direction |
the mode of stepwise search, can be one of |
.
trace |
if positive, information is printed during the running of
|
keep |
a filter function whose input is a fitted model object and the associated 'AIC' statistic, and whose output is arbitrary. Typically 'keep' will select a subset of the components of the object and return them. The default is not to keep anything. |
steps |
the maximum number of steps to be considered. The default is 1000 (essentially as many as required). It is typically used to stop the process early. |
scale |
scale is nor used in gamlss |
what |
which distribution parameter is required, default |
parameter |
equivalent to |
k |
the multiple of the number of degrees of freedom used for the penalty. Only 'k = 2' gives the genuine AIC: 'k = log(n)' is sometimes referred to as BIC or SBC. |
parallel |
The type of parallel operation to be used (if any). If missing, the default is "no". |
ncpus |
integer: number of processes to be used in parallel operation: typically one would chose this to the number of available CPUs. |
cl |
An optional parallel or snow cluster for use if |
sigma.scope |
scope for |
nu.scope |
scope for |
tau.scope |
scope for |
mu.try |
The default value is is TRUE, set to FALSE if no model for |
sigma.try |
The default value is TRUE, set to FALSE if no model for |
nu.try |
The default value is TRUE, set to FALSE if no model for |
tau.try |
The default value is TRUE, set to FALSE if no model for |
test |
whether to print the chi-square test or not |
sorted |
whether to sort the results |
... |
any additional arguments to 'extractAIC'. (None are currently used.) |
Details
The set of models searched is determined by the scope
argument.
For the function stepGAIC.VR()
the right-hand-side of its lower
component is always included in the model, and right-hand-side of the model is included in the upper
component. If scope
is a single formula, it specifies the upper
component,
and the lower
model is empty. If scope
is missing, the initial model
is used as the upper
model.
Models specified by scope
can be templates to update object
as
used by update.formula
.
For the function stepGAIC.CH()
each of the formulas in scope specifies a
"regimen" of candidate forms in which the particular term may enter the model.
For example, a term formula might be
~ x1 + log(x1) + cs(x1, df=3)
This means that x1 could either appear linearly, linearly in its logarithm, or as a smooth function estimated non-parametrically. Every term in the model is described by such a term formula, and the final model is built up by selecting a component from each formula.
The function gamlss.scope
similar to the S gam.scope()
in Chambers and Hastie (1991) can be used to create automatically
term formulae from specified data or model frames.
The supplied model object is used as the starting model, and hence there is the requirement that one term from each of the term formulas of the parameters be present in the formula of the distribution parameter. This also implies that any terms in formula of the distribution parameter not contained in any of the term formulas will be forced to be present in every model considered.
When the smoother used in gamlss
modelling belongs to the new generation of smoothers allowing the determination of the smoothing parameters
automatically (i.e. pb()
, cy()
) then the function stepGAIC.VR()
can be used for model selection (see example below).
Value
the stepwise-selected model is returned, with up to two additional components. There is an '"anova"' component corresponding to the steps taken in the search, as well as a '"keep"' component if the 'keep=' argument was supplied in the call. The '"Resid. Dev"' column of the analysis of deviance table refers to a constant minus twice the maximized log likelihood
The function stepGAICAll.A()
returns with a component "anovaAll" containing all the different anova tables used in the process.
Author(s)
Mikis Stasinopoulos based on functions in MASS library and in Statistical Models in S
References
Chambers, J. M. and Hastie, T. J. (1991). Statistical Models in S, Chapman and Hall, London.
Rigby, R. A. and Stasinopoulos D. M. (2005). Generalized additive models for location, scale and shape,(with discussion), Appl. Statist., 54, part 3, pp 507-554.
Rigby, R. A., Stasinopoulos, D. M., Heller, G. Z., and De Bastiani, F. (2019) Distributions for modeling location, scale, and shape: Using GAMLSS in R, Chapman and Hall/CRC. An older version can be found in https://www.gamlss.com/.
Stasinopoulos D. M. Rigby R.A. (2007) Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, Vol. 23, Issue 7, Dec 2007, https://www.jstatsoft.org/v23/i07/.
Stasinopoulos D. M., Rigby R.A., Heller G., Voudouris V., and De Bastiani F., (2017) Flexible Regression and Smoothing: Using GAMLSS in R, Chapman and Hall/CRC.
(see also https://www.gamlss.com/).
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Examples
## Not run:
data(usair)
# -----------------------------------------------------------------------------
# null model
mod0<-gamlss(y~1, data=usair, family=GA)
# all the explanatotory variables x1:x6 fitted linearly
mod1<-gamlss(y~., data=usair, family=GA)
#-------------------------------------------------------------------------------
# droping terms
dropterm(mod1)
# with chi-square information
drop1(mod1)
# for parallel computations use something like
nC <- detectCores()
drop1(mod1, parallel="snow", ncpus=nC)
drop1(mod1, parallel="multicore", ncpus=nC)
#------------------------------------------------------------------------------
# adding terms
addterm(mod0, scope=as.formula(paste("~", paste(names(usair[-1]),
collapse="+"),sep="")))
# with chi-square information
add1(mod0, scope=as.formula(paste("~", paste(names(usair[-1]),
collapse="+"),sep="")))
# for parallel computations
nC <- detectCores()
add1(mod0, scope=as.formula(paste("~", paste(names(usair[-1]),
collapse="+"),sep="")), parallel="snow", ncpus=nC)
#------------------------------------------------------------------------------
#------------------------------------------------------------------------------
# stepGAIC
# find the best subset for the mu
mod2 <- stepGAIC(mod1)
mod2$anova
#--------------------------------------------------------------
# for parallel computations
mod21 <- stepGAIC(mod1, , parallel="snow", ncpus=nC)
#--------------------------------------------------------------
# find the best subset for sigma
mod3<-stepGAIC(mod2, what="sigma", scope=~x1+x2+x3+x4+x5+x6)
mod3$anova
#--------------------------------------------------------------
# find the best model using pb() smoother
#only three variables are used here for simplicity
mod20<-stepGAIC(mod0, scope=list(lower=~1, upper=~pb(x1)+pb(x2)+pb(x5)))
edf(mod20)
# note that x1 and x2 enter linearly
#------------------------------------------------------------------------------
#------------------------------------------------------------------------------
# the stepGAIC.CH function (no parallel here)
# creating a scope from the usair model frame
gs<-gamlss.scope(model.frame(y~x1+x2+x3+x4+x5+x6, data=usair))
gs
mod5<-stepGAIC.CH(mod0,gs)
mod5$anova
#-----------------------------------------------------------------------------=-
#------------------------------------------------------------------------------
# now stepGAICAll.A
mod7<-stepGAICAll.A(mod0, scope=list(lower=~1,upper=~x1+x2+x3+x4+x5+x6))
#-----------------------------------------------------------------------------=-
#------------------------------------------------------------------------------
# now stepGAICAll.B
drop1All(mod1, parallel="snow", ncpus=nC)
add1All(mod0, scope=as.formula(paste("~", paste(names(usair[-1]),
collapse="+"))), parallel="snow", ncpus=nC)
mod8<-stepGAICAll.B(mod0, scope=list(lower=~1,upper=~x1+x2+x3+x4+x5+x6))
#-----------------------------------------------------------------------------=-
#------------------------------------------------------------------------------
## End(Not run)