7.4 Atlas-Cached Import Optimization

This page describes the refresh mode used when the ATLAS and GTFS preprocessing inputs are unchanged.

Purpose

When the HTTP validators for the ATLAS and GTFS sources are unchanged, the pipeline can skip the expensive preprocessing download step and reuse the cached ATLAS/GTFS artifacts already written to disk.

If the database already contains the required static ATLAS/GTFS tables, the importer rewrites only the OSM-dependent and match-dependent tables. If those static tables are missing or empty, the importer performs a bootstrap rewrite of both the static and dynamic import tables without re-downloading ATLAS/GTFS.

This reduces database churn without changing the matching result for the current OSM snapshot.

Run Types

  • complete: full preprocessing and full database refresh.
  • atlas_cached: reuse ATLAS/GTFS preprocessing artifacts and keep the static raw ATLAS/GTFS tables in place.
  • atlas_cached_bootstrap: reuse cached ATLAS/GTFS preprocessing artifacts, but rewrite both static and dynamic import tables because the required static tables are missing or empty.

The scheduler selects atlas_cached automatically when both source snapshots are unchanged and the required static tables are already present with data.

The scheduler selects atlas_cached_bootstrap automatically when both source snapshots are unchanged but the required static tables are missing or empty.

Set PIPELINE_FORCE_FULL_REFRESH=1 to keep the cached preprocessing outputs but still force a full database rewrite.

Tables Reused In atlas_cached

  • atlas_operators
  • atlas_stops
  • gtfs_stops_raw
  • gtfs_stop_identity_resolution
  • atlas_line_families
  • atlas_itineraries
  • atlas_itinerary_stop_calls

These are the persisted static import tables produced from cached ATLAS/GTFS artifacts.

Tables Rewritten In atlas_cached

  • itinerary_matches
  • line_family_matches
  • stop_calls
  • itineraries
  • line_families
  • osm_route_relation_stops
  • osm_route_relation_members
  • osm_route_relation_tags
  • osm_route_relations
  • osm_route_master_members
  • osm_route_master_tags
  • osm_route_masters
  • problems
  • stops_matched
  • osm_stop_members
  • osm_nodes
  • osm_stops

The normalized comparison layer is rewritten as a whole because it combines ATLAS and OSM inputs and carries the current matching output.

Tables Rewritten In atlas_cached_bootstrap

atlas_cached_bootstrap uses the same table rewrite scope as complete for the import step. In practice that means it rewrites:

  • all atlas_cached dynamic tables listed above
  • atlas_operators
  • atlas_stops
  • gtfs_stops_raw
  • gtfs_stop_identity_resolution
  • atlas_line_families
  • atlas_itineraries
  • atlas_itinerary_stop_calls

Unlike complete, it still skips the ATLAS/GTFS download and preprocessing subprocess when cached artifacts are already valid.

Safety Checks

The importer validates that these required static tables exist and contain data before allowing an atlas_cached refresh:

  • atlas_stops
  • gtfs_stops_raw
  • gtfs_stop_identity_resolution
  • atlas_line_families
  • atlas_itineraries
  • atlas_itinerary_stop_calls

If any of them are missing or empty, the scheduler switches to atlas_cached_bootstrap instead of failing the run.

Observability

Pipeline status and data/data_meta.json record:

  • run_type
  • refresh_scope_tables_rewritten
  • refresh_scope_tables_reused

This makes it visible whether the last successful run was a full rebuild, an atlas_cached refresh, or an atlas_cached_bootstrap refresh.

Data update in progress
Elapsed: -- ETA: -- Phase: idle