5.1 Import Process
The import process turns matching output plus processed CSV artifacts into the PostgreSQL schema that powers the web application.
The scheduled flow is split into two broad phases:
- precompute phase outside the maintenance window
- write phase inside the maintenance window
The goal is to keep the blocking phase limited to TRUNCATE plus bulk INSERT work. The same precompute/write functions are also used when matching_and_import_db/database/importer.py is run directly.
High-Level Flow
Precompute Phase
The scheduler runs these steps before entering the maintenance window.
1. Problem precompute
precompute_problem_artifacts() builds the shared ProblemContext and attaches the stop-level ProblemResult objects needed later by the importer.
2. Route precompute
precompute_route_artifacts():
- loads the processed route CSVs
- collects the ATLAS SLOIDs that will actually exist in the import
- calls
build_route_write_payload()
That route payload already contains:
- source ATLAS line-family rows (
atlas_line_families) - source route relation rows used by the web and stats (
osm_route_relations) - normalized
line_families,itineraries, andstop_calls line_family_matchesitinerary_matches
3. GTFS identity payloads
_build_gtfs_insert_payloads() prepares the raw GTFS stop rows and GTFS-to-ATLAS identity rows while build_fast_insert_payloads() is assembling the final write payload. It uses the cached import artifacts when both files are present:
data/processed/gtfs_stops_raw.csvdata/processed/gtfs_stop_identity_resolution.csv
If either cache file is missing, it rebuilds from data/raw/stops_ATLAS.csv plus data/raw/gtfs/.
The rebuild path reuses the canonical matcher from matching_and_import_db/downloader/get_atlas_gtfs.py, rewrites data/gtfs_atlas_stats.json, and refreshes the cache CSVs.
4. Build insert payloads
build_fast_insert_payloads() converts everything into plain row dictionaries.
The payload is split conceptually into:
- static rows reused by
atlas_cachedruns: ATLAS/GTFS tables - dynamic rows rewritten every run: OSM tables,
stops_matched,problems,osm_route_relations, normalized route tables, and route match tables - meta values such as
matched_routesandno_nearby_osm_sloids
Write Phase
import_to_database() performs the blocking database work.
Refresh-scope selection
Before truncation, the importer chooses the table set from get_refresh_scope_tables(run_type):
completerewrites everythingatlas_cached_bootstrapalso rewrites the full table set, but is used specifically to rebuild missing or empty ATLAS/GTFS static tables from cached preprocessing artifactsatlas_cachedrewrites only the dynamic tables and validates that the required static ATLAS/GTFS tables already exist
Bulk insert order
The write order is intentionally staged:
osm_nodesosm_stopsosm_stop_members- static stop/GTFS tables in
completeandatlas_cached_bootstrapmodes only:atlas_operatorsatlas_stopsgtfs_stops_rawgtfs_stop_identity_resolution
stops_matchedproblemsafter the newstops_matched.idvalues are known- static ATLAS route source table in
completeandatlas_cached_bootstrapmodes only:atlas_line_families
- dynamic OSM route source table:
osm_route_relations
- normalized route tables:
line_familiesitinerariesstop_calls
- route match tables:
line_family_matchesitinerary_matches
The importer commits in batches using DB_IMPORT_BATCH_SIZE.
export DB_IMPORT_BATCH_SIZE=5000
Synthetic OSM Route Nodes
Before route rows are inserted, build_fast_insert_payloads() checks whether any route stop rows reference OSM node IDs that are not already present in the stop-matching output. If they do, it creates minimal synthetic osm_nodes rows so the route tables can keep referential integrity.
Stats Finalization
After the write phase, the scheduler records the completed import timestamp and then runs stats finalization. The detailed stats sources, timestamp fields, and regeneration behavior are documented in 5.2 Stats and timestamps.
Running the Import Directly
Environment Variables
| Variable | Description | Default |
|---|---|---|
DATABASE_URI |
Import database connection string | postgresql+psycopg://stops_user:1234@localhost:5432/import_db |
DB_IMPORT_BATCH_SIZE |
Records per batch commit | 10000 |
ATLAS_STOPS_CSV |
Override used when GTFS cache files must be rebuilt | data/raw/stops_ATLAS.csv |
GTFS_FOLDER |
Extracted GTFS folder used when GTFS cache files must be rebuilt | data/raw/gtfs |
Command
export DATABASE_URI="postgresql://user:pass@localhost/stopdb"
python [matching_and_import_db/database/importer.py](https://github.com/openTdataCH/stop_sync_osm_atlas/blob/main/matching_and_import_db/database/importer.py)
Persistence Behavior
The importer does not preserve manual database edits between runs. The import database is expected to be reproducible from the current files and matching output.