Route-Route Matching
This page explains how route equivalency is built during import preparation in matching_and_import_db/database/route_loader.py.
Important distinction:
- 2.3 Stop-stop matching using routes is stop-level matching (ATLAS stop <-> OSM stop)
- this page is route-level linking (ATLAS line family <-> OSM family, then itinerary <-> itinerary)
Where It Happens In The Pipeline
Route-route matching runs after stop matching has already produced the set of importable ATLAS SLOIDs.
The active route-level linker is not the RouteState singleton used by the stop-matching runtime. It is the normalized route-loader path inside build_route_write_payload().
Inputs
The route loader consumes:
atlas_line_families.csvatlas_itineraries.csvatlas_itinerary_stop_calls.csvosm_route_masters.csvosm_route_master_tags.csvosm_route_master_members.csvosm_route_relations.csvosm_route_relation_tags.csvosm_route_relation_members.csvosm_route_relation_stops.csv- base stop-matching output from
run_matching()
The base stop-matching output matters because it lets the route loader reuse already matched physical stop identities when comparing OSM and ATLAS stop sequences.
Step 1: Normalize Line Families
The loader first creates comparable family rows in line_families.
ATLAS family rows
- one family row per
atlas_line_id normalized_route_idcomes from the exported GTFS-normalized route ID- public display metadata comes from GTFS route fields such as
route_short_nameandroute_long_name
OSM family rows
OSM relations are grouped into families using a fallback chain:
- parent
route_master_idwhen present - normalized
gtfs_route_id - synthetic key based on
route,ref,operator, andnetwork - the relation itself as a last resort
This is why the OSM side can still produce a family row even when no route master exists and no GTFS tag is available.
Step 2: Match Line Families
After normalization, the loader scores candidate ATLAS/OSM family pairs.
Scoring rules:
- Exact GTFS route ID → score
1.0, reasonexact_gtfs_route_id - Normalized GTFS route ID → score
0.95, reasonnormalized_gtfs_route_id - Display route ID fallback → score
0.9, reasondisplay_route_id_match
Non-GTFS OSM families are skipped for deterministic pairing.
Candidate pairs are sorted by score and then chosen greedily one-to-one, so any ATLAS family and any OSM family can appear in at most one line_family_matches row.
Step 3: Normalize Itineraries and Stop Calls
Inside each side, the loader turns source itineraries into shared itineraries and stop_calls rows.
ATLAS itinerary rows
- come from
atlas_itineraries.csv - keep
direction_id,representative_headsign,direction_label,trip_count,shape_id, andheadsign_or_pattern_hash
OSM itinerary rows
- come from
osm_route_relations.csv - one relation becomes one OSM itinerary row
direction_idis taken from the stop rows when availableheadsign_or_pattern_hashis built from the ordered stop sequence hash
Shared stop-call identity
For stop-call comparison, the loader prefers shared physical stop identities in this order:
- matched OSM node -> ATLAS
sloidfrom the base stop-matching output - unique ATLAS
sloidfor a shareduic_ref - raw canonical fallback key such as
uic:<uic_ref>orosm:<node_id>
This is what allows the itinerary matcher to compare the same physical stop even when the OSM side started from a node ID and the ATLAS side started from a GTFS-derived sloid.
Step 4: Pair Itineraries
For each matched family, the loader scores every possible ATLAS/OSM itinerary pair.
The rules are:
direction_idmust match exactly- stop sequences are aligned in order
- stop-call equality uses resolved ATLAS
sloididentity first - if neither side has resolved physical stop identity for a call, matching falls back to shared
uic_ref - the pair is eligible only when
$$
\frac{\text{matched stop count}}{\max(\text{atlas stop count},\ \text{osm stop count})} \ge 0.8
$$
Eligible pairs are chosen greedily one-to-one by:
- higher matched-stop count
- higher stop ratio
- stable itinerary ID order
The resulting itinerary_matches rows keep the debugging scores written by the loader: direction_score, stop_score, overall_score, and match_reason.
Output Tables Affected
line_family_matches: matched ATLAS/OSM family pairsitinerary_matches: matched itinerary pairs inside already matched families