data_taxi {modeldatatoo} | R Documentation |
Chicago taxi data set
Description
A data set containing information on a subset of taxi trips in the city of Chicago in 2022.
Usage
data_taxi(...)
Arguments
... |
Arguments passed to |
Details
The source data are originally described on the linked City of Chicago data portal. The data exported here are a pre-processed subset motivated by the modeling problem of predicting whether a rider will tip or not.
- tip
Whether the rider left a tip. A factor with levels "yes" and "no".
- distance
The trip distance, in odometer miles.
- company
The taxi company, as a factor. Companies that occurred few times were binned as "other".
- local
Whether the trip started in the same community area as it began. See the source data for community area values.
- dow
The day of the week in which the trip began, as a factor.
- month
The month in which the trip began, as a factor.
- hour
The hour of the day in which the trip began, as a numeric.
Previous releases of this data (with version = "20230630T214846Z-643d0"
)
included additional columns:
- id
A unique identifier for the trip, as a factor.
- duration
The trip duration, in seconds.
- fare
The cost of the trip fare, in USD
- tolls
The cost of tolls for the trip, in USD.
- extras
The cost of extra charges for the trip, in USD.
- total_cost
The total cost of the trip, in USD. This is the sum of the previous three columns plus tip.
- payment_type
Type of payment for the trip. A factor with levels "Credit Card", "Dispute", "Mobile", "No Charge", "Prcard", and "Unknown".
Value
tibble
tibble print
data_taxi() #> # A tibble: 10,000 x 7 #> tip distance company local dow month hour #> <fct> <dbl> <fct> <fct> <fct> <fct> <int> #> 1 yes 17.2 Chicago Independents no Thu Feb 16 #> 2 yes 0.88 City Service yes Thu Mar 8 #> 3 yes 18.1 other no Mon Feb 18 #> 4 yes 20.7 Chicago Independents no Mon Apr 8 #> 5 yes 12.2 Chicago Independents no Sun Mar 21 #> 6 yes 0.94 Sun Taxi yes Sat Apr 23 #> 7 yes 17.5 Flash Cab no Fri Mar 12 #> 8 yes 17.7 other no Sun Jan 6 #> 9 yes 1.85 Taxicab Insurance Agency Llc no Fri Apr 12 #> 10 yes 1.47 City Service no Tue Mar 14 #> # i 9,990 more rows
glimpse()
tibble::glimpse(data_taxi()) #> Rows: 10,000 #> Columns: 7 #> $ tip <fct> yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, yes, y~ #> $ distance <dbl> 17.19, 0.88, 18.11, 20.70, 12.23, 0.94, 17.47, 17.67, 1.85, 1~ #> $ company <fct> Chicago Independents, City Service, other, Chicago Independen~ #> $ local <fct> no, yes, no, no, no, yes, no, no, no, no, no, no, no, yes, no~ #> $ dow <fct> Thu, Thu, Mon, Mon, Sun, Sat, Fri, Sun, Fri, Tue, Tue, Sun, W~ #> $ month <fct> Feb, Mar, Feb, Apr, Mar, Apr, Mar, Jan, Apr, Mar, Mar, Apr, A~ #> $ hour <int> 16, 8, 18, 8, 21, 23, 12, 6, 12, 14, 18, 11, 12, 19, 17, 13, ~
Source
https://data.cityofchicago.org/Transportation/Taxi-Trips/wrvz-psew
Examples
data_taxi()