R: Air pollution and mortality

AirPollution {lmSubsets}

R Documentation

Air pollution and mortality

Description

Data relating air pollution and mortality, frequently used for illustrations in ridge regression and related tasks.

Usage

data(AirPollution)

Format

A data frame containing 60 observations on 16 variables.

precipitation: average annual precipitation in inches
temperature1: average January temperature in degrees Fahrenheit
temperature7: average July temperature in degrees Fahrenheit
age: percentage of 1960 SMSA population aged 65 or older
household: average household size
education: median school years completed by those over 22
housing: percentage of housing units which are sound and with all facilities
population: population per square mile in urbanized areas, 1960
noncauc: percentage of non-Caucasian population in urbanized areas, 1960
whitecollar: percentage employed in white collar occupations
income: percentage of families with income < USD 3000
hydrocarbon: relative hydrocarbon pollution potential
nox: relative nitric oxides potential
so2: relative sulphur dioxide potential
humidity: annual average percentage of relative humidity at 13:00
mortality: total age-adjusted mortality rate per 100,000

Source

http://lib.stat.cmu.edu/datasets/pollution

References

McDonald GC, Schwing RC (1973). Instabilities of regression estimates relating air pollution to mortality. Technometrics, 15, 463–482.

Miller AJ (2002). Subset selection in regression. New York: Chapman and Hall.

Examples

## load data (with logs for relative potentials)
data("AirPollution", package = "lmSubsets")
for (i in 12:14)  AirPollution[[i]] <- log(AirPollution[[i]])

## fit subsets
lm_all <- lmSubsets(mortality ~ ., data = AirPollution)
plot(lm_all)

## refit best model
lm6 <- refit(lm_all, size = 6)
summary(lm6)

[Package lmSubsets version 0.5-2 Index]