4.1 Stop Problems

Stop problems are detected after the matching pipeline completes and identify data quality issues regarding individual stops.

Detection uses a predicate pipeline mirroring the matching pipeline architecture. During database import, ProblemContext.build() precomputes shared indexes (KDTrees, UIC counts, duplicate maps) from the MatchingOutput. Each stop is then evaluated by the four problem predicates.

Code: matching_and_import_db/problem_detection/

How Detection Runs

flowchart LR subgraph Input["PipelineResult"] direction TB M["matched list[MatchRecord]"] UA["unmatched_atlas list[AtlasNode]"] UO["unmatched_osm list[OsmNode]"] end CTX["ProblemContext.build()"] Input --> CTX CTX --> D["distance_problem (P1-P3)"] CTX --> A["attributes_problem (P1-P3)"] CTX --> DUP["duplicates_problem (P2-P3)"] CTX --> ISO["unmatched_problem (P1-P3)"] M -->|"evaluate_problems()"| D & A & DUP UA & UO -->|"run_problem_pipeline()"| ISO & DUP

Invocation Paths

Matched records use MatchRecord.evaluate_problems(), which calls each predicate with the MatchRecord itself. The predicates access record.atlas_node and record.osm_node directly:

current_match.evaluate_problems(problem_ctx, STOP_PROBLEM_PIPELINE)

Unmatched records are passed as bare AtlasNode or OsmNode entities to run_problem_pipeline():

problems = run_problem_pipeline(STOP_PROBLEM_PIPELINE, problem_ctx, atlas_node)

Each predicate uses isinstance checks to decide whether it applies — distance_problem returns [] for bare nodes, unmatched_problem returns [] for MatchRecord, etc.

`ProblemContext`

Built once from PipelineResult via ProblemContext.build(), providing precomputed indexes:

Index	Type	Purpose
`osm_kdtree`	`KDTree`	Spatial queries for isolation detection (all OSM coords)
`atlas_kdtree`	`KDTree`	Spatial queries for OSM isolation detection (all ATLAS coords)
`atlas_count_by_uic`	`dict[str, int]`	ATLAS platform count per UIC
`osm_count_by_uic`	`dict[str, int]`	OSM node count per UIC
`osm_platform_count_by_uic`	`dict[str, int]`	OSM platform-like node count per UIC
`duplicate_sloid_map`	`dict[str, list[str]]`	ATLAS duplicate groups
`duplicate_osm_group_map`	`dict[str, list[str]]`	OSM duplicate groups by `(uic_ref, local_ref)`
`duplicate_osm_node_ids`	`set[str]`	All OSM node IDs in a duplicate group
`handled_duplicate_sloids`	`set[str]`	ATLAS duplicates already consumed by `duplicate_propagation`, so they are not re-flagged as problems

Stop Problem Types

Problem Type	Description	Priorities	Applies to
Distance	Matched pairs too far apart	P1, P2, P3	`MatchRecord` only
Attributes	Inconsistent data for matched pairs	P1, P2, P3	`MatchRecord` only
Unmatched	Stops without a counterpart	P1, P2, P3	`AtlasNode` / `OsmNode` only
Duplicates	Redundant entries	P2, P3	All three types

4.1.1. Distance Problems

Flag matched pairs where physical distance exceeds tolerance. This typically indicates either a matching error or significant coordinate discrepancy between datasets.

The predicate reads record.distance_m and record.atlas_node.business_org_abbr directly from the MatchRecord.

Thresholds

DISTANCE_THRESHOLD_P1 = 80   # meters
DISTANCE_THRESHOLD_P2 = 25   # meters
DISTANCE_THRESHOLD_P3 = 15   # meters

Priority Logic

Priority	Condition	Rationale
P1	Non-SBB AND distance > 80m	Large displacement for non-railway
P2	Non-SBB AND 25m < distance <= 80m	Moderate displacement
P3	SBB AND distance > 25m	Railway tolerance (large platforms)
P3	Any operator AND 15m < distance <= 25m	Minor displacement

Note

SBB platforms can span many meters, so higher distance tolerance is applied. The SBB check uses AtlasNode.business_org_abbr.

Example: A bus stop matched with 85m distance would be flagged as P1 (critical), while a train platform with the same distance would be P3 (minor).

4.1.2. Unmatched Problems

Identify stops that failed to match. The predicate receives bare AtlasNode or OsmNode entities and uses ProblemContext spatial indexes to compute isolation.

ATLAS Unmatched Priority

Uses ctx.nearest_osm_distance() (KDTree query) and ctx.osm_count_by_uic:

Priority	Condition	Rationale
P1	`ctx.osm_count_by_uic` has 0 entries for this `AtlasNode.uic_ref`	Completely missing counterpart
P1	Nearest OSM node > 80m away (or none)	Completely isolated
P2	Nearest OSM node > 50m away	Partially isolated
P2	Platform count mismatch (ATLAS vs OSM for same UIC)	Data inconsistency
P3	All other unmatched	Has nearby candidates

OSM Unmatched Priority

Uses ctx.nearest_atlas_distance() (KDTree query) and ctx.atlas_count_by_uic:

Priority	Condition	Rationale
P1	`ctx.atlas_count_by_uic` has 0 entries for this `OsmNode.uic_ref`	Completely missing counterpart
P2	Nearest ATLAS stop > 50m away (or none)	Spatially isolated, but still lower than the no-ATLAS-by-UIC case
P2	Platform count mismatch (ATLAS vs OSM for same UIC)	Data inconsistency
P3	All other unmatched	Has nearby candidates

Isolation Detection

Isolation is computed using ProblemContext.nearest_osm_distance() / nearest_atlas_distance(), which query the precomputed KDTrees. Separately from problem detection, the importer marks unmatched ATLAS entries with no OSM node within 50m as match_type='no_nearby_counterpart' in stops_matched.

4.1.3. Attribute Problems

Flag inconsistencies between matched pairs. The predicate reads fields directly from record.atlas_node and record.osm_node on the MatchRecord.

Priority Logic

Priority	Condition	Fields Compared
P1	Different UIC reference	`AtlasNode.uic_ref` vs `OsmNode.uic_ref`
P1	Different official name	`AtlasNode.designation_official` vs `OsmNode.uic_name`
P2	Different local reference	`AtlasNode.designation` vs `OsmNode.local_ref`
P3	Different operator	`AtlasNode.business_org_abbr` vs `OsmNode.operator`

Note: Name and local_ref comparisons are case-insensitive. UIC comparisons are exact. Each check can be individually toggled via ENABLE_*_CHECK constants in context.py.

4.1.4. Duplicate Problems

Identify redundant entries in either dataset. This predicate is polymorphic — it handles MatchRecord, AtlasNode, and OsmNode.

Priority	Type	Condition	Detection
P3	OSM	Same `(uic_ref, local_ref)` for `public_transport in {platform, stop_position}` nodes, excluding pre-grouped OSM pairs	Pre-computed in `ProblemContext._build_osm_duplicate_map()`
P2	ATLAS	`sloid` appears in `duplicate_sloid_map` and was not already handled by `duplicate_propagation`	From matching pipeline's `AtlasState` plus `handled_duplicate_sloids`

OSM duplicates are only flagged for nodes with OsmNode.public_transport equal to platform or stop_position. When both ATLAS and OSM duplicates exist, only the OSM duplicate is flagged (OSM-side takes precedence). ATLAS duplicates that already produced a duplicate_propagation match are deliberately suppressed to avoid double-reporting the same grouping behavior.

3. Problems

3.2 Route Entity Problems

Generating Report

Documentation

4.1 Stop Problems

How Detection Runs

Invocation Paths

`ProblemContext`

Stop Problem Types

4.1.1. Distance Problems

Thresholds

Priority Logic

4.1.2. Unmatched Problems

ATLAS Unmatched Priority

OSM Unmatched Priority

Isolation Detection

4.1.3. Attribute Problems

Priority Logic

4.1.4. Duplicate Problems