R: Best Subset Selection

bestSubset {glmtoolbox}

R Documentation

Best Subset Selection

Description

Best subset selection by exhaustive search in generalized linear models.

Usage

bestSubset(
  object,
  nvmax = 8,
  nbest = 1,
  force.in = NULL,
  force.out = NULL,
  verbose = TRUE,
  digits = max(3, getOption("digits") - 2)
)

Arguments

`object`	one object of the class glm, which is assumed to be the full model.
`nvmax`	an (optional) positive integer value indicating the maximum size of subsets to examine.
`nbest`	an (optional) positive integer value indicating the number of subsets of each size to record.
`force.in`	an (optional) positive integers vector indicating the index of columns of model matrix that should be in all models.
`force.out`	an (optional) positive integers vector indicating the index of columns of model matrix that should be in no models.
`verbose`	an (optional) logical indicating if should the report of results be printed. As default, `verbose` is set to TRUE.
`digits`	an (optional) integer value indicating the number of decimal places to be used. As default, `digits` is set to `max(3, getOption("digits") - 2)`.

Details

In order to apply the "best subset" selection, an exhaustive search is conducted, separately for every size from i to nvmax, to identify the model with the smallest deviance value. Therefore, if, for a fixed model size, the interest model selection criteria reduce to monotone functions of deviance, thus differing only in the way the sizes of the models are compared, then the results of the "best subset" selection do not depend upon the choice of the trade-off between goodness-of-fit and complexity on which they are based.

Examples

###### Example 1: Fuel consumption of automobiles
Auto <- ISLR::Auto
Auto2 <- within(Auto, origin <- factor(origin))
mod <- mpg ~ cylinders + displacement + acceleration + origin + horsepower*weight
fit1 <- glm(mod, family=inverse.gaussian(log), data=Auto2)
out1 <- bestSubset(fit1)
out1

###### Example 2: Patients with burn injuries
burn1000 <- aplore3::burn1000
burn1000 <- within(burn1000, death <- factor(death, levels=c("Dead","Alive")))
mod <- death ~ gender + race + flame + age*tbsa*inh_inj
fit2 <- glm(mod, family=binomial(logit), data=burn1000)
out2 <- bestSubset(fit2)
out2

###### Example 3: Advertising
data(advertising)
fit3 <- glm(sales ~ log(TV)*radio*newspaper, family=gaussian(log), data=advertising)
out3 <- bestSubset(fit3)
out3

[Package glmtoolbox version 0.1.12 Index]