AirPollution {lmSubsets} | R Documentation |
Air pollution and mortality
Description
Data relating air pollution and mortality, frequently used for illustrations in ridge regression and related tasks.
Usage
data(AirPollution)
Format
A data frame containing 60 observations on 16 variables.
- precipitation
average annual precipitation in inches
- temperature1
average January temperature in degrees Fahrenheit
- temperature7
average July temperature in degrees Fahrenheit
- age
percentage of 1960 SMSA population aged 65 or older
- household
average household size
- education
median school years completed by those over 22
- housing
percentage of housing units which are sound and with all facilities
- population
population per square mile in urbanized areas, 1960
- noncauc
percentage of non-Caucasian population in urbanized areas, 1960
- whitecollar
percentage employed in white collar occupations
- income
percentage of families with income < USD 3000
- hydrocarbon
relative hydrocarbon pollution potential
- nox
relative nitric oxides potential
- so2
relative sulphur dioxide potential
- humidity
annual average percentage of relative humidity at 13:00
- mortality
total age-adjusted mortality rate per 100,000
Source
http://lib.stat.cmu.edu/datasets/pollution
References
McDonald GC, Schwing RC (1973). Instabilities of regression estimates relating air pollution to mortality. Technometrics, 15, 463–482.
Miller AJ (2002). Subset selection in regression. New York: Chapman and Hall.
Examples
## load data (with logs for relative potentials)
data("AirPollution", package = "lmSubsets")
for (i in 12:14) AirPollution[[i]] <- log(AirPollution[[i]])
## fit subsets
lm_all <- lmSubsets(mortality ~ ., data = AirPollution)
plot(lm_all)
## refit best model
lm6 <- refit(lm_all, size = 6)
summary(lm6)