bestSubset {glmtoolbox}R Documentation

Best Subset Selection

Description

Best subset selection by exhaustive search in generalized linear models.

Usage

bestSubset(
  object,
  nvmax = 8,
  nbest = 1,
  force.in = NULL,
  force.out = NULL,
  verbose = TRUE,
  digits = max(3, getOption("digits") - 2)
)

Arguments

object

one object of the class glm, which is assumed to be the full model.

nvmax

an (optional) positive integer value indicating the maximum size of subsets to examine.

nbest

an (optional) positive integer value indicating the number of subsets of each size to record.

force.in

an (optional) positive integers vector indicating the index of columns of model matrix that should be in all models.

force.out

an (optional) positive integers vector indicating the index of columns of model matrix that should be in no models.

verbose

an (optional) logical indicating if should the report of results be printed. As default, verbose is set to TRUE.

digits

an (optional) integer value indicating the number of decimal places to be used. As default, digits is set to max(3, getOption("digits") - 2).

Details

In order to apply the "best subset" selection, an exhaustive search is conducted, separately for every size from i to nvmax, to identify the model with the smallest deviance value. Therefore, if, for a fixed model size, the interest model selection criteria reduce to monotone functions of deviance, thus differing only in the way the sizes of the models are compared, then the results of the "best subset" selection do not depend upon the choice of the trade-off between goodness-of-fit and complexity on which they are based.

Examples

###### Example 1: Fuel consumption of automobiles
Auto <- ISLR::Auto
Auto2 <- within(Auto, origin <- factor(origin))
mod <- mpg ~ cylinders + displacement + acceleration + origin + horsepower*weight
fit1 <- glm(mod, family=inverse.gaussian(log), data=Auto2)
out1 <- bestSubset(fit1)
out1

###### Example 2: Patients with burn injuries
burn1000 <- aplore3::burn1000
burn1000 <- within(burn1000, death <- factor(death, levels=c("Dead","Alive")))
mod <- death ~ gender + race + flame + age*tbsa*inh_inj
fit2 <- glm(mod, family=binomial(logit), data=burn1000)
out2 <- bestSubset(fit2)
out2

###### Example 3: Advertising
data(advertising)
fit3 <- glm(sales ~ log(TV)*radio*newspaper, family=gaussian(log), data=advertising)
out3 <- bestSubset(fit3)
out3


[Package glmtoolbox version 0.1.11 Index]