R: Cluster nearby stops within a group

cluster_stops {tidytransit}

R Documentation

Cluster nearby stops within a group

Description

Finds clusters of stops for each unique value in group_col (e.g. stop_name). Can be used to find different groups of stops that share the same name but are located more than max_dist apart. gtfs_stops is assigned a new column (named cluster_colname) which contains the group_col value and the cluster number.

Usage

cluster_stops(
  gtfs_stops,
  max_dist = 300,
  group_col = "stop_name",
  cluster_colname = "stop_name_cluster"
)

Arguments

`gtfs_stops`	Stops table of a gtfs object. It is also possible to pass a tidygtfs object to enable piping.
`max_dist`	Only stop groups that have a maximum distance among them above this threshold (in meters) are clustered.
`group_col`	Clusters for are calculated for each set of stops with the same value in this column (default: stop_name)
`cluster_colname`	Name of the new column name. Can be the same as group_col to overwrite.

Details

stats::kmeans() is used for clustering.

Value

Returns a stops table with an added cluster column. If gtfs_stops is a tidygtfs object, a modified tidygtfs object is return

Examples


library(dplyr)
nyc_path <- system.file("extdata", "google_transit_nyc_subway.zip", package = "tidytransit")
nyc <- read_gtfs(nyc_path)
nyc <- cluster_stops(nyc)

# There are 6 stops with the name "86 St" that are far apart
stops_86_St = nyc$stops %>% 
  filter(stop_name == "86 St")

table(stops_86_St$stop_name_cluster)
#> 86 St [1] 86 St [2] 86 St [3] 86 St [4] 86 St [5] 86 St [6] 
#>         3         3         3         3         3         3 

stops_86_St %>% select(stop_id, stop_name, parent_station, stop_name_cluster) %>% head()
#> # A tibble: 6 × 4
#>   stop_id stop_name parent_station stop_name_cluster
#>   <chr>   <chr>     <chr>          <chr>            
#> 1 121     86 St     ""             86 St [3]        
#> 2 121N    86 St     "121"          86 St [3]        
#> 3 121S    86 St     "121"          86 St [3]        
#> 4 626     86 St     ""             86 St [4]        
#> 5 626N    86 St     "626"          86 St [4]        
#> 6 626S    86 St     "626"          86 St [4]

library(ggplot2)
ggplot(stops_86_St) +
  geom_point(aes(stop_lon, stop_lat, color = stop_name_cluster))

[Package tidytransit version 1.6.1 Index]