Download and Process Data
This chapter explains the source-side preprocessing performed by matching_and_import_db/downloader/.
The downloader layer has two jobs:
- fetch and filter the raw ATLAS, GTFS, and OSM inputs
- materialize the source-side CSV artifacts consumed later by the matching runtime and the database importer
The route comparison logic itself is documented in 3. Routes. This chapter stops at the files written under data/raw/ and data/processed/.
The preprocessing output falls into two artifact families:
- Stop identity artifacts used by stop matching and the GTFS
stop_id<->sloiddiagnostic map - Source-side route artifacts used by
matching_and_import_db/database/route_loader.py
The diagrams below show the files produced by each source-processing path.
ATLAS + GTFS Pipeline
flowchart LR
classDef plain fill:#fff,stroke:#ced4da,stroke-width:1px;
classDef script fill:#eef3fb,stroke:#174092,stroke-width:2px;
classDef orch fill:#fdf8ef,stroke:#F0AD4E,stroke-width:2px;
classDef file fill:#f8f9fa,stroke:#6c757d,stroke-width:1px;
classDef transient fill:#f8f9fa,stroke:#6c757d,stroke-width:1px,stroke-dasharray: 5 5;
subgraph StopSrc ["Stop Data"]
AT[ATLAS Stops Data]:::plain
end
subgraph TimeSrc ["Timetable Data"]
direction TB
GT[GTFS Data]:::plain
end
SA["get_atlas_data.py\n(Orchestrator)"]:::orch
subgraph Modules ["Processing Modules"]
direction TB
SG[get_atlas_gtfs.py]:::script
end
ST["swiss_trip_stop_times.csv\n(Transient Stage File)"]:::transient
subgraph Outputs ["Final Output Files"]
direction TB
PA(stops_ATLAS.csv):::file
PR["atlas_line_families.csv<br/>atlas_itineraries.csv<br/>atlas_itinerary_stop_calls.csv"]:::file
PG["gtfs_stops_raw.csv<br/>gtfs_stop_identity_resolution.csv"]:::file
PS["gtfs_atlas_stats.json"]:::file
end
AT --> SA --> PA
GT --> SG
SA -.->|Invokes| SG
SG -->|Writes| ST
ST -.->|Read in chunks| SA
SG -->|Returns Data| SA
SA --> PR
SG --> PG
SG --> PS
click SA "https://github.com/openTdataCH/stop_sync_osm_atlas/blob/main/matching_and_import_db/downloader/get_atlas_data.py"
click SG "https://github.com/openTdataCH/stop_sync_osm_atlas/blob/main/matching_and_import_db/downloader/get_atlas_gtfs.py"
OSM Pipeline
flowchart LR
classDef plain fill:#fff,stroke:#ced4da,stroke-width:1px;
classDef script fill:#eef3fb,stroke:#174092,stroke-width:2px;
classDef file fill:#f8f9fa,stroke:#6c757d,stroke-width:1px;
subgraph Sources ["Data Sources"]
OV[Overpass API]:::plain
end
subgraph Scripts ["Processing Scripts"]
SO[get_osm_data.py]:::script
end
subgraph Outputs ["Output Files"]
direction TB
PX(osm_data.xml):::file
PR["osm_route_masters.csv<br/>osm_route_master_tags.csv<br/>osm_route_master_members.csv<br/>osm_route_relations.csv<br/>osm_route_relation_tags.csv<br/>osm_route_relation_members.csv<br/>osm_route_relation_stops.csv"]:::file
end
Sources ~~~ Scripts ~~~ Outputs
OV --> SO
SO --> PX
SO --> PR
click SO "https://github.com/openTdataCH/stop_sync_osm_atlas/blob/main/matching_and_import_db/downloader/get_osm_data.py"
Data Sources
| Input | Source | Key Filters | Output |
|---|---|---|---|
| ATLAS Traffic Points | OpenTransportData.swiss | UIC 85, CH polygon, valid, BOARDING_PLATFORM |
stops_ATLAS.csv |
| GTFS | OpenTransportData.swiss | Extract only stops.txt, stop_times.txt, trips.txt, routes.txt; Swiss stops; single-pass streaming; canonical GTFS stop_id <-> ATLAS sloid resolution |
atlas_line_families.csv, atlas_itineraries.csv, atlas_itinerary_stop_calls.csv, gtfs_stops_raw.csv, gtfs_stop_identity_resolution.csv, gtfs_atlas_stats.json |
| OpenStreetMap | Overpass API | Switzerland, public transport nodes, way stops, route relations, route_master relations | osm_data.xml, osm_route_masters.csv, osm_route_master_tags.csv, osm_route_master_members.csv, osm_route_relations.csv, osm_route_relation_tags.csv, osm_route_relation_members.csv, osm_route_relation_stops.csv |
Source-Side Code Paths
| Module | Responsibility |
|---|---|
matching_and_import_db/downloader/get_atlas_data.py |
ATLAS download, filtering, GTFS orchestration, and high-level preprocessing flow |
matching_and_import_db/downloader/get_atlas_gtfs.py |
GTFS extraction, Swiss stop-time streaming, GTFS stop_id resolution, and ATLAS route CSV generation |
matching_and_import_db/downloader/get_osm_data.py |
Overpass download plus route-master / route-relation CSV generation |
Directory Structure
The pipeline organizes data into the following structure:
data/
├── raw/ # Downloaded source data
│ ├── osm_data.xml # Raw OSM from Overpass API
│ ├── stops_ATLAS.csv # Filtered ATLAS platforms
│ ├── switzerland.geojson # Swiss border polygon
│ ├── gtfs/ # Extracted GTFS subset used by this project
│ │ ├── stops.txt
│ │ ├── stop_times.txt
│ │ ├── trips.txt
│ │ ├── routes.txt
│ │ └── swiss_trip_stop_times.csv
├── processed/ # Transformed data
│ ├── atlas_line_families.csv
│ ├── atlas_itineraries.csv
│ ├── atlas_itinerary_stop_calls.csv
│ ├── gtfs_stops_raw.csv
│ ├── gtfs_stop_identity_resolution.csv
│ ├── osm_route_masters.csv
│ ├── osm_route_master_tags.csv
│ ├── osm_route_master_members.csv
│ ├── osm_route_relations.csv
│ ├── osm_route_relation_tags.csv
│ ├── osm_route_relation_members.csv
│ └── osm_route_relation_stops.csv
├── gtfs_atlas_stats.json # GTFS-to-ATLAS sidecar stats
└── debug/ # Review files
└── org_mismatches_review.txt
Output Boundaries
The downloader layer produces source-side artifacts only.
atlas_line_families.csv,atlas_itineraries.csv, andatlas_itinerary_stop_calls.csvpreserve the ATLAS-side GTFS reconstruction.osm_route_masters.csv,osm_route_relations.csv, and their tag/member tables preserve the OSM PTv2 route entities.gtfs_stops_raw.csvandgtfs_stop_identity_resolution.csvpreserve the canonical GTFS identity state.
The shared comparison tables (line_families, itineraries, stop_calls, line_family_matches, itinerary_matches) are built later by matching_and_import_db/database/route_loader.py during import preparation.
Detailed Documentation
- 1.1 ATLAS Stops: Filtering ATLAS traffic points into the canonical stop input file.
- 1.2 GTFS ATLAS Data: Streaming GTFS processing, canonical GTFS
stop_idresolution, and ATLAS-side route artifact generation. - 1.3 OSM Data: Overpass download, retained stop attributes, and OSM route-master / route-relation artifact generation.
- 3. Routes: How the importer turns those source artifacts into shared route families, itineraries, and route matches.