7.1 Pipeline Tests

The matching pipeline test suite is split into focused layers so that small regressions are caught quickly without always running the full import pipeline.

What We Test

The matching test files currently cover four different scopes:

tests/matching_pipeline/test_matching.py

unit tests for the predicate system
distance helper coverage for bipartite_match()
route ID normalization coverage
pipeline runner behavior
regression coverage for stale batched candidates

tests/matching_pipeline/test_small_pipeline.py

end-to-end execution on tests/data/sample_atlas.csv and tests/data/sample_osm.xml
importer execution against in-memory SQLite
DB hydration checks for StopsMatched, AtlasStop, and OsmNode

tests/matching_pipeline/test_data_integration.py

route-data integration tests for atlas_routes_gtfs.csv generation logic
guard test for the GTFS permalink year (ensures no stale hardcoded timetable-YYYY-gtfs2020 links remain for non-current years)

tests/matching_pipeline/test_validators.py

common validation utilities used by both the pipeline and backend

Why The Unit Tests Matter

The matching predicates mutate shared state while they run:

ctx.commit() marks ATLAS rows as matched
ctx.commit() marks OSM representatives as used
OSM grouping hides sibling nodes behind a representative OsmEntity

That means some regressions only appear when multiple rows interact inside a single predicate loop. The focused unit tests are the cheapest place to catch those bugs.

One important example is the stale-candidate regression test: it verifies that after one ATLAS row consumes an OSM representative, a later ATLAS row in the same NearestDistancePredicate run cannot still reuse that representative from an old batch snapshot.

Why The Small Pipeline Test Matters

The end-to-end test validates the full lifecycle:

environment override to sample data
in-memory DB setup
run_matching()
import_to_database()
final DB assertions

This is the test that proves the pipeline still works as an integrated system, not just as isolated helpers.

Recommended Local Command

Run the full matching-pipeline subset with:

docker compose run --rm --no-deps --entrypoint '' \
  -e DATABASE_URI=sqlite:// \
  test python -m pytest tests/matching_pipeline -q

Run just the end-to-end sample pipeline with:

docker compose run --rm --no-deps --entrypoint '' \
  -e DATABASE_URI=sqlite:// \
  test python -m pytest tests/matching_pipeline/test_small_pipeline.py -v

Note

The test service image includes the full geospatial stack (pandas, geopandas, GDAL, etc.) required for these tests.

7. Test

7.2 Backend Tests

Generating Report

Documentation

7.1 Pipeline Tests

What We Test

Why The Unit Tests Matter

Why The Small Pipeline Test Matters

Recommended Local Command