AirlineArrival {fastR2} | R Documentation |
Airline On-Time Arrival Data
Description
Flights categorized by destination city, airline, and whether or not the flight was on time.
Format
A data frame with 11000 observations on the following 3 variables.
- airport
a factor with levels
LosAngeles
,Phoenix
,SanDiego
,SanFrancisco
,Seattle
- result
a factor with levels
Delayed
,OnTime
- airline
a factor with levels
Alaska
,AmericaWest
Source
Barnett, Arnold. 1994. “How numbers can trick you.” Technology Review, vol. 97, no. 7, pp. 38–45.
References
These and similar data appear in many text books under the topic of Simpson's paradox.
Examples
tally(
airline ~ result, data = AirlineArrival,
format = "perc", margins = TRUE)
tally(
result ~ airline + airport,
data = AirlineArrival, format = "perc", margins = TRUE)
AirlineArrival2 <-
AirlineArrival %>%
group_by(airport, airline, result) %>%
summarise(count = n()) %>%
group_by(airport, airline) %>%
mutate(total = sum(count), percent = count/total * 100) %>%
filter(result == "Delayed")
AirlineArrival3 <-
AirlineArrival %>%
group_by(airline, result) %>%
summarise(count = n()) %>%
group_by(airline) %>%
mutate(total = sum(count), percent = count/total * 100) %>%
filter(result == "Delayed")
gf_line(percent ~ airport, color = ~ airline, group = ~ airline,
data = AirlineArrival2) %>%
gf_point(percent ~ airport, color = ~ airline, size = ~total,
data = AirlineArrival2) %>%
gf_hline(yintercept = ~ percent, color = ~airline,
data = AirlineArrival3, linetype = "dashed") %>%
gf_labs(y = "percent delayed")
[Package fastR2 version 1.2.4 Index]