4.1 Stop Problems

Stop problems are detected after the matching pipeline completes and identify data quality issues regarding individual stops.

Detection uses a predicate pipeline mirroring the matching pipeline architecture. During database import, ProblemContext.build() precomputes shared indexes (KDTrees, UIC counts, duplicate maps) from the MatchingOutput. Each stop is then evaluated by the four problem predicates.

Code: matching_and_import_db/problem_detection/

How Detection Runs

flowchart LR subgraph Input["PipelineResult"] direction TB M["matched<br/><i>list[MatchRecord]</i>"] UA["unmatched_atlas<br/><i>list[AtlasNode]</i>"] UO["unmatched_osm<br/><i>list[OsmNode]</i>"] end CTX["ProblemContext.build()"] Input --> CTX CTX --> D["distance_problem<br/>(P1-P3)"] CTX --> A["attributes_problem<br/>(P1-P3)"] CTX --> DUP["duplicates_problem<br/>(P2-P3)"] CTX --> ISO["unmatched_problem<br/>(P1-P3)"] M -->|"evaluate_problems()"| D & A & DUP UA & UO -->|"run_problem_pipeline()"| ISO & DUP

Invocation Paths

Matched records use MatchRecord.evaluate_problems(), which calls each predicate with the MatchRecord itself. The predicates access record.atlas_node and record.osm_node directly:

current_match.evaluate_problems(problem_ctx, STOP_PROBLEM_PIPELINE)

Unmatched records are passed as bare AtlasNode or OsmNode entities to run_problem_pipeline():

problems = run_problem_pipeline(STOP_PROBLEM_PIPELINE, problem_ctx, atlas_node)

Each predicate uses isinstance checks to decide whether it applies — distance_problem returns [] for bare nodes, unmatched_problem returns [] for MatchRecord, etc.

ProblemContext

Built once from PipelineResult via ProblemContext.build(), providing precomputed indexes:

Index Type Purpose
osm_kdtree KDTree Spatial queries for isolation detection (all OSM coords)
atlas_kdtree KDTree Spatial queries for OSM isolation detection (all ATLAS coords)
atlas_count_by_uic dict[str, int] ATLAS platform count per UIC
osm_count_by_uic dict[str, int] OSM node count per UIC
osm_platform_count_by_uic dict[str, int] OSM platform-like node count per UIC
duplicate_sloid_map dict[str, list[str]] ATLAS duplicate groups
duplicate_osm_group_map dict[str, list[str]] OSM duplicate groups by (uic_ref, local_ref)
duplicate_osm_node_ids set[str] All OSM node IDs in a duplicate group
handled_duplicate_sloids set[str] ATLAS duplicates already consumed by duplicate_propagation, so they are not re-flagged as problems

Stop Problem Types

Problem Type Description Priorities Applies to
Distance Matched pairs too far apart P1, P2, P3 MatchRecord only
Attributes Inconsistent data for matched pairs P1, P2, P3 MatchRecord only
Unmatched Stops without a counterpart P1, P2, P3 AtlasNode / OsmNode only
Duplicates Redundant entries P2, P3 All three types

4.1.1. Distance Problems

Flag matched pairs where physical distance exceeds tolerance. This typically indicates either a matching error or significant coordinate discrepancy between datasets.

The predicate reads record.distance_m and record.atlas_node.business_org_abbr directly from the MatchRecord.

Thresholds

DISTANCE_THRESHOLD_P1 = 80   # meters
DISTANCE_THRESHOLD_P2 = 25   # meters
DISTANCE_THRESHOLD_P3 = 15   # meters

Priority Logic

Priority Condition Rationale
P1 Non-SBB AND distance > 80m Large displacement for non-railway
P2 Non-SBB AND 25m < distance <= 80m Moderate displacement
P3 SBB AND distance > 25m Railway tolerance (large platforms)
P3 Any operator AND 15m < distance <= 25m Minor displacement
Note

SBB platforms can span many meters, so higher distance tolerance is applied. The SBB check uses AtlasNode.business_org_abbr.

Example: A bus stop matched with 85m distance would be flagged as P1 (critical), while a train platform with the same distance would be P3 (minor).


4.1.2. Unmatched Problems

Identify stops that failed to match. The predicate receives bare AtlasNode or OsmNode entities and uses ProblemContext spatial indexes to compute isolation.

ATLAS Unmatched Priority

Uses ctx.nearest_osm_distance() (KDTree query) and ctx.osm_count_by_uic:

Priority Condition Rationale
P1 ctx.osm_count_by_uic has 0 entries for this AtlasNode.uic_ref Completely missing counterpart
P1 Nearest OSM node > 80m away (or none) Completely isolated
P2 Nearest OSM node > 50m away Partially isolated
P2 Platform count mismatch (ATLAS vs OSM for same UIC) Data inconsistency
P3 All other unmatched Has nearby candidates

OSM Unmatched Priority

Uses ctx.nearest_atlas_distance() (KDTree query) and ctx.atlas_count_by_uic:

Priority Condition Rationale
P1 ctx.atlas_count_by_uic has 0 entries for this OsmNode.uic_ref Completely missing counterpart
P2 Nearest ATLAS stop > 50m away (or none) Spatially isolated, but still lower than the no-ATLAS-by-UIC case
P2 Platform count mismatch (ATLAS vs OSM for same UIC) Data inconsistency
P3 All other unmatched Has nearby candidates

Isolation Detection

Isolation is computed using ProblemContext.nearest_osm_distance() / nearest_atlas_distance(), which query the precomputed KDTrees. Separately from problem detection, the importer marks unmatched ATLAS entries with no OSM node within 50m as match_type='no_nearby_counterpart' in stops_matched.


4.1.3. Attribute Problems

Flag inconsistencies between matched pairs. The predicate reads fields directly from record.atlas_node and record.osm_node on the MatchRecord.

Priority Logic

Priority Condition Fields Compared
P1 Different UIC reference AtlasNode.uic_ref vs OsmNode.uic_ref
P1 Different official name AtlasNode.designation_official vs OsmNode.uic_name
P2 Different local reference AtlasNode.designation vs OsmNode.local_ref
P3 Different operator AtlasNode.business_org_abbr vs OsmNode.operator

Note: Name and local_ref comparisons are case-insensitive. UIC comparisons are exact. Each check can be individually toggled via ENABLE_*_CHECK constants in context.py.


4.1.4. Duplicate Problems

Identify redundant entries in either dataset. This predicate is polymorphic — it handles MatchRecord, AtlasNode, and OsmNode.

Priority Type Condition Detection
P3 OSM Same (uic_ref, local_ref) for public_transport in {platform, stop_position} nodes, excluding pre-grouped OSM pairs Pre-computed in ProblemContext._build_osm_duplicate_map()
P2 ATLAS sloid appears in duplicate_sloid_map and was not already handled by duplicate_propagation From matching pipeline's AtlasState plus handled_duplicate_sloids

OSM duplicates are only flagged for nodes with OsmNode.public_transport equal to platform or stop_position. When both ATLAS and OSM duplicates exist, only the OSM duplicate is flagged (OSM-side takes precedence). ATLAS duplicates that already produced a duplicate_propagation match are deliberately suppressed to avoid double-reporting the same grouping behavior.

Data update running in background
Preparing update... | Phase: initializing
Data update in progress
Core data is being refreshed. Use this time to read the documentation.
Elapsed: -- ETA: -- Phase: idle