Route Pipeline Data Flow
This page explains how route data moves from source files to the imported route browser model.
The route pipeline has three layers:
- Source artifacts generated by the downloader
- Normalized comparison tables generated by the route loader
- Match tables that link equivalent ATLAS and OSM route entities
Source Layer
The downloader stage writes two source-specific route products.
ATLAS side
The ATLAS stops CSV does not contain native line-family or itinerary entities, so the pipeline reconstructs them from GTFS.
The GTFS side uses four files:
| File | Role |
|---|---|
stops.txt |
stop IDs, names, coordinates, and stop metadata |
stop_times.txt |
ordered stop sequence per trip_id |
trips.txt |
assigns each trip_id to a route_id and direction_id |
routes.txt |
line-level metadata such as short name, long name, and route type |
Conceptually:
route_idbecomes the ATLAS line-family key- raw trips are grouped into itinerary buckets inside each line family
- each emitted itinerary keeps one representative ordered stop sequence
- each stop call keeps the resolved physical stop identity when available
The resulting source tables are:
atlas_line_families.csvatlas_itineraries.csvatlas_itinerary_stop_calls.csv
OSM side
On the OSM PTv2 side, the route structure is entity-first:
- stop and platform elements
type=routerelations- optional
type=route_masterrelations above them
The downloader preserves that hierarchy in:
osm_route_masters.csvosm_route_master_tags.csvosm_route_master_members.csvosm_route_relations.csvosm_route_relation_tags.csvosm_route_relation_members.csvosm_route_relation_stops.csv
In this model, the OSM type=route relation is the itinerary or variant layer, while type=route_master is the family layer when present.
Normalized Comparison Layer
matching_and_import_db/database/route_loader.py converts both source products into a shared comparison model.
The normalized tables are:
line_familiesitinerariesstop_calls
line_families
This table stores one comparable family row per source-side family.
- ATLAS rows are created from
atlas_line_id - OSM rows are grouped by
route_master_idwhen present - if no route master exists, the loader falls back to normalized
gtfs_route_id - if that is still missing, it falls back again to synthetic keys based on
ref,operator,network, or the relation itself
itineraries
This table stores one comparable itinerary row per source-side itinerary.
- ATLAS itineraries come from
atlas_itineraries.csv - OSM itineraries come from
osm_route_relations.csv - both sides carry a
direction_id, display label, and stop count when available
stop_calls
This table stores ordered stop membership for each itinerary.
- ATLAS stop calls carry GTFS stop IDs, resolved SLOIDs, platform codes, and stop labels
- OSM stop calls carry resolved node IDs, member roles, canonical stop keys, and stop labels
The importer prefers a shared physical stop identity whenever it can resolve one. On the OSM side that resolution uses the base stop-matching output first, then single-candidate UIC fallbacks, then the raw OSM canonical key.
Match Layer
After normalization, the route loader builds two match tables:
line_family_matchesitinerary_matches
The family match table links one ATLAS line family to one OSM family.
The itinerary match table links one ATLAS itinerary to one OSM itinerary inside an already matched family.
The pairing rules are documented in 3.2 Route-Route Matching.
Relationship to Stop Matching
Route data is used in two different places in the system:
- Stop-level route matching inside the matching pipeline, where route tokens help decide whether a nearby ATLAS and OSM stop are the same physical stop
- Route-level comparison during import preparation, where whole line families and itineraries are normalized and paired
That distinction is important:
- 2.3 Stop-stop matching using routes is about stop matching
- this chapter is about the route model and route comparison tables
Main Code Paths
| Module | Role |
|---|---|
matching_and_import_db/downloader/get_atlas_gtfs.py |
Builds ATLAS-side GTFS route artifacts and GTFS identity caches |
matching_and_import_db/downloader/get_osm_data.py |
Builds OSM route-master and route-relation artifacts |
matching_and_import_db/database/route_loader.py |
Normalizes both sides and writes route match payloads |
matching_and_import_db/database/importer.py |
Persists the source tables, normalized tables, and route match tables |