5.2 Stats and timestamps

The stats export path produces the JSON consumed by the analytics page, the documentation placeholders, and the PDF summary. The output is intentionally reproducible: every pipeline run rebuilds the stats from fresh pipeline artifacts and the current import database state.

Timestamps

The system tracks several layers of timestamps to ensure data freshness and traceability. These timestamps are primarily stored in data/data_meta.json and mirrored into data/stats.json during export:

Timestamp	Source	Meaning
`last_modified` (ATLAS/GTFS)	Server Header	When the original source data file was last updated by the provider on their servers.
`atlas_downloaded_at` / `gtfs_downloaded_at`	Local Clock	When the pipeline fetched the local snapshot of the raw files from the source server.
`last_overpass_query_at`	Local Clock	When the pipeline successfully queried and fetched the OpenStreetMap data.
`preprocessing_completed_at`	Local Clock	When the download and initial filtering of raw files finished.
`last_pipeline_data_import_ended_at`	Local Clock	The timestamp marking the end of the entire pipeline run (after database import and processing are fully complete).
`generated_at` / `stats_computed_at`	Local Clock	When the statistics and `stats.json` file were generated for the dashboard.
`atlas_filtering.downloaded_at`	Local Clock	When the initial platform filtering (e.g., Swiss borders, type) was applied to the ATLAS raw data.

Output files

Two files matter:

File	Role
`data/gtfs_atlas_stats.json`	GTFS-specific sidecar generated during GTFS integration. Contains the canonical `gtfs_atlas` block used later by the final export.
`data/stats.json`	Final aggregate stats file consumed by the web app and docs. Combines pipeline metrics, GTFS sidecar stats, route stats, quality metrics, and DB-derived problem counts.

High-Level Flow

flowchart TD A[ATLAS download and filtering] --> B[GTFS integration] B --> C[data/gtfs_atlas_stats.json] A --> D[data/stats.json atlas_filtering] E[Matching output] --> F[Import DB refresh] F --> G[export_stats_after_import] C --> G D --> G G --> H[data/stats.json] H --> I[Analytics page] H --> J[Documentation placeholders] H --> K[PDF summary]

Generation stages

1. Early ATLAS filtering stats

The standalone ATLAS download step in matching_and_import_db/downloader/get_atlas_data.py records filter counts such as:

raw ATLAS rows
rows removed by country, geography, validity, and type filters
final BOARDING_PLATFORM totals

Those values are written under atlas_filtering in data/stats.json before the main import runs.

2. GTFS sidecar generation

During GTFS integration, matching_and_import_db/downloader/get_atlas_gtfs.py computes GTFS-to-ATLAS mapping statistics while matching GTFS stop_id values to ATLAS sloid values.

That stage writes data/gtfs_atlas_stats.json with the canonical structure:

{
  "atlas": {
    "total": 0,
    "touched_by_gtfs_routes": 0,
    "coverage_percent": 0.0
  },
  "gtfs_stop_ids": {
    "total": 0,
    "matched_to_atlas": 0,
    "unmatched": 0,
    "coverage_percent": 0.0
  }
}

The final export embeds that object into data/stats.json under the gtfs_atlas key.

The final export also projects the scheduler's preprocessing metadata from data/data_meta.json into a source_downloads block so docs can render the latest ATLAS and GTFS download timestamps.

The GTFS sidecar covers:

ATLAS-side route coverage: how many ATLAS stops are touched by GTFS-derived route rows
GTFS-side mapping coverage: how many GTFS stop_id values map to an ATLAS sloid
assignment counts for strict and unique-number fallback matching
cardinality diagnostics (1 → 1, 1 → many, many → 1)
unmatched GTFS reason counts

3. Final aggregate export

After the import DB is refreshed, matching_and_import_db/database/importer.py calls export_stats_after_import().

That function delegates to backend/services/stats_export.py, which assembles the final data/stats.json from several sources:

Source	What it contributes
Matching output (`matched`, `unmatched_atlas`, `unmatched_osm`)	summary counts, match stage breakdowns, duplicate counts, unmatched analysis
OSM stop units and route members	OSM route coverage and many-to-one analysis inputs
`data/gtfs_atlas_stats.json`	canonical `gtfs_atlas` block
`data/data_meta.json`	`last_pipeline_data_import_ended_at` plus docs-facing `source_downloads` metadata and `last_overpass_query_at`
Import DB	problem counts, route problem counts, route-route matching counts
Existing `data/stats.json`	only explicitly independent keys such as `atlas_filtering`

The final export does not preserve arbitrary old keys. This is intentional: the file should reflect the current schema only.

What `export_pipeline_stats()` computes

The main export function computes the pipeline-derived sections directly from in-memory matching output:

summary
matching_stages
unmatched_analysis
duplicates
osm_way_stops
match_type_counts
route_matching
routes
gtfs_atlas

Then the importer augments that result with:

quality_metrics
problems
route_route_matching

The problems block in data/stats.json is the DB-backed stop-problem summary. It includes top-level counts for distance, attributes, contradicts_route_matching, unmatched, and duplicates, plus aggregate fields such as total_stops, stops_with_problems, clean_entries, and the nested by_priority breakdown.

Why the split exists

The GTFS sidecar is generated earlier than the final DB-backed export because GTFS mapping is known during GTFS integration, not during the later database import step. Keeping it as a separate intermediate artifact avoids recomputing GTFS matching just to build the final stats file.

Consumers

The main readers of data/stats.json are:

backend/app.py for the analytics page
backend/services/docs_stats.py for <span class="dynamic-stat stat-unavailable" data-stat-key="..." title="Auto-updated from pipeline stats">—</span> placeholder replacement in docs
backend/services/stats_export.py for PDF summary generation

Regeneration paths

There are two common ways to refresh stats:

Run the full pipeline/import flow. This regenerates both data/gtfs_atlas_stats.json and data/stats.json.
Run scripts/regenerate_stats.py. This only refreshes the DB-backed sections and summary from the current import database; it does not recompute the earlier GTFS or ATLAS download stages.

Invariants

The current design assumes:

data/stats.json is disposable and can be rebuilt at any time
stats schema changes should update the exporter and consumers directly rather than adding compatibility aliases
independent pre-export stats must be copied explicitly, not by preserving unknown keys from older files

5.1 Import Process

6. Web App

Generating Report

Documentation