R: Automated model selection and multimodel inference with...

glmulti {glmulti}

R Documentation

Automated model selection and multimodel inference with (G)LMs

Description

glmulti finds what are the n best models (the confidence set of models) among all possible models (the candidate set, as specified by the user). Models are fitted with the specified fitting function (default is glm) and are ranked with the specified Information Criterion (default is aicc). The best models are found either through exhaustive screening of the candidates, or using a genetic algorithm, which allows very large candidate sets to be adressed. The output can be used for model selection, variable selection, and multimodel inference.

Usage

#  glmulti S4 generic 
glmulti(y, xr, data, exclude = c(), name = "glmulti.analysis", 
		intercept = TRUE, marginality = FALSE,
		bunch=30, chunk = 1, chunks = 1, level = 2,
		minsize = 0, maxsize = -1, minK = 0, maxK = -1,
		method = "h", crit = "aic", confsetsize = 100, popsize = 100, 
		mutrate = 10^-3, sexrate = 0.1, imm = 0.3, plotty = TRUE,
		report = TRUE, deltaM = 0.05, deltaB = 0.05, conseq = 5,
		fitfunction = "glm", resumefile = "id", includeobjects=TRUE, ...)

Arguments

`y`	A formula, character string, or fitted model (of class lm or glm) specifying the response variable and the terms (main effects and/or interactions) to be used in the candidate models (e.g. height~age*sex+mass). Alternatively, a character string naming the variable to be used as response (e.g. "height") (in which case the names of the predictors must be passed through the xr argument) Alternatively, a custom list of (fitted) model objects can also be passed (can be convenient for small candidate sets).
`xr`	An optional character array specifying the variables (categorical or quantitative) to be used as predictors, e.g. c("age", "height" , "mass")
`exclude`	Optional character vector naming terms (main effects or interactions) to be excluded from the candidate models, e.g. c("mass:height")
`intercept`	Whether to include an intercept in the candidate models or not.
`level`	If 1, only main effects (terms of order 1) are used to build the candidate set. If 2, pairwise interactions are also used (higher order interactions are currently ignored)
`data`	A data.frame containing the data. If not specified, `glmulti` will try to find the data in the environment of the formula, from the fitted model passed as y argument, or from the global environment.
`name`	The name of this `glmulti` analysis. Optional.
`marginality`	Whether to apply the marginality rule or not. If TRUE, only marginal models will be considered.
`minsize`	This sets a constraint on candidate models. Minimal number of TERMS (main effects or interactions) to be included in candidate models (negative = no constraint)
`maxsize`	This sets a constraint on candidate models. Maximal number of TERMS to be included in candidate models (negative = no constraint)
`minK`	This sets a constraint on candidate models. Minimal complexity of candidate models (negative = no constraint)
`maxK`	This sets a constraint on candidate models. Maximal complexity of candidate models (negative = no constraint)
`method`	The method to be used to explore the candidate set of models. If "h" an exhaustive screening is undertaken. If "g" the genetic algorithm is employed (recommended for large candidate sets). If "l", a very fast exhaustive branch-and-bound algorithm is used. Package leaps must then be loaded, and this can only be applied to linear models with covariates and no interactions. If "d", a simple summary of the candidate set is printed, including the number of candidate models.
`crit`	The Information Criterion to be used. This should be a function that accepts a fitted model as first argument. Default is the original Akaike IC (aic). Other provided functions are the Bayes IC (`bic`), the small-sample corrected AIC (`aicc`) and QAIC/QAICc (`qaic` and `qaicc`).
`fitfunction`	The fitting function to be used. Any function similar to `glm` can be used. See Examples
`confsetsize`	The number of models to be looked for, i.e. the size of the returned confidence set.
`plotty`	Whether to plot the progress of the IC profile when running.
`report`	Whether to report about the progress at run time.
`bunch`	The number of model formulas to be returned (to be fitted) at each call to the enumerator. Exhaustive screening only.
`chunk`	When using an exhaustive screening approach, it can be splitted in several parts to take advantage of multiple CPUs. chunk is an integer specifying which part the current call should perform.
`chunks`	When splitting an exhaustive screening, the total number of parts the task should be divided into. For example, with a quad-core processor, 4 may be useful. Use consensus to bring back the pieces into a single object.
`popsize`	The population size for the genetic algorithm
`mutrate`	The per locus (i.e. per term) mutation rate for genetic algorithm, between 0 and 1
`sexrate`	The rate of sexual reproduction for the genetic algorithm, between 0 and 1
`imm`	The rate of immigration for the genetic algorithm, between 0 and 1
`deltaM`	The target change in mean IC (defines the stop rules for the genetic algorithm)
`deltaB`	The target change in best IC (defines the stop rules for the genetic algorithm)
`conseq`	The target successive number of times with no improvement (i.e. target changes have been attained) (defines the stop rule for the GA). The greater it is, the more stringent the stop rule.
`resumefile`	When resuming an analysis (method="r"), the name of the files from which to resume. Default uses the same as name
`includeobjects`	Whether or not to include fiited models as objects. This makes `coef` and `predict` faster and is very convenient, but can be turned off in case fitted models are very large or are not to be used after.
`...`	Further arguments to be passed to the fitting function. E.g. maxit=50 or family=binomial

Details

glmulti is defined as a S4 function. It acts as a frontend that calls background compiled functions (contained if archive glmulti.jar). Running the function therefore requires a Java Running Environment, and package rJava. A thorough description of this function and package can be found in the article by Calcagno and de Mazancourt (see References). print.glmulti and summary.glmulti are S3 methods which provide a synthesis of glmulti analyses.

NOTE: When calling glmulti with a model object as y, only the formula will be extracted from the object. This means that optional arguments to the fitting function (e.g. family or maxit) will NOT be extracted. These arguments should be passed to glmulti through the ....

Value

An object of class glmulti is returned. It is a S4 object with several slots containing relevant data for model selection and beyond.

Several standard S3 functions are provided to help access the content of this object.

Several glmulti objects can be shrunk to one using the function consensus. This is useful to get the best of several replicates (of the genetic algorithm) or to bring together the different parts of a splitted exhaustive screening. When running a genetic algorithm, two tiny java files (serialized objects) are also written to the disk at regular intervals. They can be used to resume the calculation (method="r") if it was interrupted for any reason. This can also be used to continue a GA with modified parameters (e.g. mutation rate).

Author(s)

Vincent Calcagno, McGill University, Canada

References

Buckland (1997) Model Selection: an Integral Part of Inference. Biometrics 10:41 Burnham & Anderson (2002) Model Selection and Multimodel Inference: an Information Theoretic Approach Calcagno \& de Mazancourt 2010 J. Stat. Soft. v34 i12. See http://www.jstatsoft.org/v34/i12

Examples

# See the document "glmulti.pdf" included in the package.
# It explains the general approach and shows how to use 
# glmulti with mixed models from the lme4 package.
# Other examples:
# A. This shows how to do the same for zero-inflated poisson models
# we load the required package
library(pscl)
# a random vector of count data
round(runif(100, 0,20)*round(runif(100)))-> vy2
# dummy predictors
va = runif(100)
vb = runif(100)
# 1. The wrapper function
zeroinfl.glmulti=function(formula, data, inflate = "|1",...)  {
    zeroinfl(as.formula(paste(deparse(formula), inflate)),data=data,...)
} 
# The default getfit and aicc method will work for zeroinfl objects,
# so no need to redefine them
# we can proceed directly
glmulti(vy2~va*vb,fitfunc=zeroinfl.glmulti,inflate="|1")->bab


# B. This shows how to include some terms in ALL the models
# As above, we just prepare a wrapper of the fitting function
glm.redefined = function(formula, data, always="", ...) {
glm(as.formula(paste(deparse(formula), always)), data=data, ...)
}
# we then use this fitting function in glmulti
glmulti(vy2~va,level=1,fitfunc=glm.redefined,always="+vb")-> bab
# va will be shuffled but vb is always included in the models

# this procedure allows support of arbitrarily any fitting function, 
# or the use of sophisticated constraints on the model structure

[Package glmulti version 1.0.8 Index]