7.1 Pipeline Tests
The matching pipeline test suite is split into focused layers so that small regressions are caught quickly without always running the full import pipeline.
What We Test
The matching test files currently cover four different scopes:
tests/matching_pipeline/test_matching.py
- unit tests for the predicate system
- distance helper coverage for
bipartite_match() - route ID normalization coverage
- pipeline runner behavior
- regression coverage for stale batched candidates
tests/matching_pipeline/test_small_pipeline.py
- end-to-end execution on
tests/data/sample_atlas.csvandtests/data/sample_osm.xml - importer execution against in-memory SQLite
- DB hydration checks for
StopsMatched,AtlasStop, andOsmNode
tests/matching_pipeline/test_data_integration.py
- route-data integration tests for
atlas_routes_gtfs.csvgeneration logic - guard test for the GTFS permalink year (ensures no stale hardcoded
timetable-YYYY-gtfs2020links remain for non-current years)
tests/matching_pipeline/test_validators.py
- common validation utilities used by both the pipeline and backend
Why The Unit Tests Matter
The matching predicates mutate shared state while they run:
ctx.commit()marks ATLAS rows as matchedctx.commit()marks OSM representatives as used- OSM grouping hides sibling nodes behind a representative
OsmEntity
That means some regressions only appear when multiple rows interact inside a single predicate loop. The focused unit tests are the cheapest place to catch those bugs.
One important example is the stale-candidate regression test: it verifies that after one ATLAS row consumes an OSM representative, a later ATLAS row in the same NearestDistancePredicate run cannot still reuse that representative from an old batch snapshot.
Why The Small Pipeline Test Matters
The end-to-end test validates the full lifecycle:
- environment override to sample data
- in-memory DB setup
run_matching()import_to_database()- final DB assertions
This is the test that proves the pipeline still works as an integrated system, not just as isolated helpers.
Recommended Local Command
Run the full matching-pipeline subset with:
docker compose run --rm --no-deps --entrypoint '' \
-e DATABASE_URI=sqlite:// \
test python -m pytest tests/matching_pipeline -q
Run just the end-to-end sample pipeline with:
docker compose run --rm --no-deps --entrypoint '' \
-e DATABASE_URI=sqlite:// \
test python -m pytest tests/matching_pipeline/test_small_pipeline.py -v
The test service image includes the full geospatial stack (pandas, geopandas, GDAL, etc.) required for these tests.