AirPollution {lmSubsets} | R Documentation |
Air pollution and mortality
Data relating air pollution and mortality, frequently used for illustrations in ridge regression and related tasks.
A data frame containing 60 observations on 16 variables.
- precipitation
average annual precipitation in inches
- temperature1
average January temperature in degrees Fahrenheit
- temperature7
average July temperature in degrees Fahrenheit
- age
percentage of 1960 SMSA population aged 65 or older
- household
average household size
- education
median school years completed by those over 22
- housing
percentage of housing units which are sound and with all facilities
- population
population per square mile in urbanized areas, 1960
- noncauc
percentage of non-Caucasian population in urbanized areas, 1960
- whitecollar
percentage employed in white collar occupations
- income
percentage of families with income < USD 3000
- hydrocarbon
relative hydrocarbon pollution potential
- nox
relative nitric oxides potential
- so2
relative sulphur dioxide potential
- humidity
annual average percentage of relative humidity at 13:00
- mortality
total age-adjusted mortality rate per 100,000
McDonald GC, Schwing RC (1973). Instabilities of regression estimates relating air pollution to mortality. Technometrics, 15, 463–482.
Miller AJ (2002). Subset selection in regression. New York: Chapman and Hall.
## load data (with logs for relative potentials)
data("AirPollution", package = "lmSubsets")
for (i in 12:14) AirPollution[[i]] <- log(AirPollution[[i]])
## fit subsets
lm_all <- lmSubsets(mortality ~ ., data = AirPollution)
## refit best model
lm6 <- refit(lm_all, size = 6)