7.1 Pipeline Tests

The matching pipeline test suite is split into focused layers so that small regressions are caught quickly without always running the full import pipeline.

What We Test

The matching test files currently cover four different scopes:

  1. tests/matching_pipeline/test_matching.py
  • unit tests for the predicate system
  • distance helper coverage for bipartite_match()
  • route ID normalization coverage
  • pipeline runner behavior
  • regression coverage for stale batched candidates
  1. tests/matching_pipeline/test_small_pipeline.py
  • end-to-end execution on tests/data/sample_atlas.csv and tests/data/sample_osm.xml
  • importer execution against in-memory SQLite
  • DB hydration checks for StopsMatched, AtlasStop, and OsmNode
  1. tests/matching_pipeline/test_data_integration.py
  • route-data integration tests for atlas_routes_gtfs.csv generation logic
  • guard test for the GTFS permalink year (ensures no stale hardcoded timetable-YYYY-gtfs2020 links remain for non-current years)
  1. tests/matching_pipeline/test_validators.py
  • common validation utilities used by both the pipeline and backend

Why The Unit Tests Matter

The matching predicates mutate shared state while they run:

  • ctx.commit() marks ATLAS rows as matched
  • ctx.commit() marks OSM representatives as used
  • OSM grouping hides sibling nodes behind a representative OsmEntity

That means some regressions only appear when multiple rows interact inside a single predicate loop. The focused unit tests are the cheapest place to catch those bugs.

One important example is the stale-candidate regression test: it verifies that after one ATLAS row consumes an OSM representative, a later ATLAS row in the same NearestDistancePredicate run cannot still reuse that representative from an old batch snapshot.

Why The Small Pipeline Test Matters

The end-to-end test validates the full lifecycle:

  1. environment override to sample data
  2. in-memory DB setup
  3. run_matching()
  4. import_to_database()
  5. final DB assertions

This is the test that proves the pipeline still works as an integrated system, not just as isolated helpers.

Recommended Local Command

Run the full matching-pipeline subset with:

docker compose run --rm --no-deps --entrypoint '' \
  -e DATABASE_URI=sqlite:// \
  test python -m pytest tests/matching_pipeline -q

Run just the end-to-end sample pipeline with:

docker compose run --rm --no-deps --entrypoint '' \
  -e DATABASE_URI=sqlite:// \
  test python -m pytest tests/matching_pipeline/test_small_pipeline.py -v
Note

The test service image includes the full geospatial stack (pandas, geopandas, GDAL, etc.) required for these tests.

Data update running in background
Preparing update... | Phase: initializing
Data update in progress
Core data is being refreshed. Use this time to read the documentation.
Elapsed: -- ETA: -- Phase: idle