AirPollution {lmSubsets}R Documentation

Air pollution and mortality

Description

Data relating air pollution and mortality, frequently used for illustrations in ridge regression and related tasks.

Usage

data(AirPollution)

Format

A data frame containing 60 observations on 16 variables.

precipitation

average annual precipitation in inches

temperature1

average January temperature in degrees Fahrenheit

temperature7

average July temperature in degrees Fahrenheit

age

percentage of 1960 SMSA population aged 65 or older

household

average household size

education

median school years completed by those over 22

housing

percentage of housing units which are sound and with all facilities

population

population per square mile in urbanized areas, 1960

noncauc

percentage of non-Caucasian population in urbanized areas, 1960

whitecollar

percentage employed in white collar occupations

income

percentage of families with income < USD 3000

hydrocarbon

relative hydrocarbon pollution potential

nox

relative nitric oxides potential

so2

relative sulphur dioxide potential

humidity

annual average percentage of relative humidity at 13:00

mortality

total age-adjusted mortality rate per 100,000

Source

http://lib.stat.cmu.edu/datasets/pollution

References

McDonald GC, Schwing RC (1973). Instabilities of regression estimates relating air pollution to mortality. Technometrics, 15, 463–482.

Miller AJ (2002). Subset selection in regression. New York: Chapman and Hall.

Examples

## load data (with logs for relative potentials)
data("AirPollution", package = "lmSubsets")
for (i in 12:14)  AirPollution[[i]] <- log(AirPollution[[i]])

## fit subsets
lm_all <- lmSubsets(mortality ~ ., data = AirPollution)
plot(lm_all)

## refit best model
lm6 <- refit(lm_all, size = 6)
summary(lm6)

[Package lmSubsets version 0.5-2 Index]