stepCriterion.glm {glmtoolbox}R Documentation

Variable Selection in Generalized Linear Models

Description

Performs variable selection in generalized linear models using hybrid versions of forward stepwise and backward stepwise.

Usage

## S3 method for class 'glm'
stepCriterion(
  model,
  criterion = c("adjr2", "bic", "aic", "p-value", "qicu"),
  test = c("wald", "lr", "score", "gradient"),
  direction = c("forward", "backward"),
  levels = c(0.05, 0.05),
  trace = TRUE,
  scope,
  force.in,
  force.out,
  ...
)

Arguments

model

an object of the class glm.

criterion

an (optional) character string indicating the criterion which should be used to compare the candidate models. The available options are: AIC ("aic"), BIC ("bic"), adjusted deviance-based R-squared ("adjr2"), and p-value of the test test ("p-value"). As default, criterion is set to "adjr2".

test

an (optional) character string indicating the statistical test which should be used to compare nested models. The available options are: Wald ("wald"), Rao's score ("score"), likelihood-ratio ("lr") and gradient ("gradient") tests. As default, test is set to "wald".

direction

an (optional) character string indicating the type of procedure which should be used. The available options are: hybrid backward stepwise ("backward") and hybrid forward stepwise ("forward"). As default, direction is set to "forward".

levels

an (optional) two-dimensional vector of values in the interval (0,1) indicating the levels at which the variables should in and out from the model. This is only appropiate if criterion="p-value". As default, levels is set to c(0.05,0.05).

trace

an (optional) logical switch indicating if should the stepwise reports be printed. As default, trace is set to TRUE.

scope

an (optional) list, containing components lower and upper, both formula-type objects, indicating the range of models which should be examined in the stepwise search. As default, lower is a model with no predictors and upper is the linear predictor of the model in model.

force.in

an (optional) formula-type object indicating the effects that should be in all models

force.out

an (optional) formula-type object indicating the effects that should be in no models

...

further arguments passed to or from other methods. For example, k, that is, the magnitude of the penalty in the AIC/QICu, which by default is set to 2.

Details

The "hybrid forward stepwise" algorithm starts with the simplest model (which may be chosen at the argument scope, and As default, is a model whose parameters in the linear predictor, except the intercept, if any, are set to 0), and then the candidate models are built by hierarchically including effects in the linear predictor, whose "relevance" and/or "importance" in the model fit is assessed by comparing nested models (that is, by comparing the models with and without the added effect) using a criterion previously specified. If an effect is added to the equation, this strategy may also remove any effect which, according to the previously specified criterion, no longer provides improvement in the model fit. That process continues until no more effects are included or excluded. The "hybrid backward stepwise" algorithm works similarly.

Value

a list list with components including

initial a character string indicating the linear predictor of the "initial model",
direction a character string indicating the type of procedure which was used,
criterion a character string indicating the criterion used to compare the candidate models,
final a character string indicating the linear predictor of the "final model",
final.fit an object of class glm with the results of the fit to the data of the "final model",

References

James G., Witten D., Hastie T., Tibshirani R. (2013, page 210) An Introduction to Statistical Learning with Applications in R, Springer, New York.

See Also

bestSubset, stepCriterion.lm, stepCriterion.overglm, stepCriterion.glmgee

Examples

###### Example 1: Fuel consumption of automobiles
Auto <- ISLR::Auto
Auto2 <- within(Auto, origin <- factor(origin))
mod <- mpg ~ cylinders + displacement + acceleration + origin + horsepower*weight
fit1 <- glm(mod, family=inverse.gaussian("log"), data=Auto2)
stepCriterion(fit1, direction="forward", criterion="p-value", test="lr")
stepCriterion(fit1, direction="backward", criterion="bic", force.in=~cylinders)

###### Example 2: Patients with burn injuries
burn1000 <- aplore3::burn1000
burn1000 <- within(burn1000, death <- factor(death, levels=c("Dead","Alive")))
upper <- ~ age + gender + race + tbsa + inh_inj + flame + age*inh_inj + tbsa*inh_inj
fit2 <- glm(death ~ age + gender + race + tbsa + inh_inj, family=binomial("logit"), data=burn1000)
stepCriterion(fit2, direction="backward", criterion="bic", scope=list(upper=upper),force.in=~tbsa)
stepCriterion(fit2, direction="forward", criterion="p-value", test="score")

###### Example 3: Skin cancer in women
data(skincancer)
upper <- cases ~ city + age + city*age
fit3 <- glm(upper, family=poisson("log"), offset=log(population), data=skincancer)
stepCriterion(fit3, direction="backward", criterion="aic", scope=list(lower=~ 1,upper=upper))
stepCriterion(fit3, direction="forward", criterion="p-value", test="lr")

[Package glmtoolbox version 0.1.12 Index]