Filter and Search Logic

This document defines the filtering and search model used on the index map page.

Performance deep-dive for possible global stats precomputation:

Core Rule

The filter system uses one boolean grammar only:

  • within a filter group, selected values combine with OR
  • across filter groups, active groups combine with AND

Formally:

FinalResult = ScopePredicate AND SearchPredicate AND AtlasPredicate AND OsmPredicate AND DuplicatePredicate

If a predicate group has no active selection, it contributes no restriction.

1. Scope Predicate

The scope predicate chooses which row categories are eligible.

ScopePredicate = MatchedBranch OR AtlasUnmatchedBranch OR OsmUnmatchedBranch

1.1 Matched Branch

Matched entries can be enabled in two ways:

  • All Matched Stops
  • one or more matched method sub-filters

Semantics:

  • All Matched Stops means rows with effective matched scope are included
  • selecting matched sub-filters means matched rows with one of those methods are included
  • if all matched method children are selected, the UI rolls up to the parent state all

Formally:

  • MatchedBranch = stop_type = matched OR stop_type = effectively_matched when the parent is all
  • MatchedBranch = stop_type = matched AND match_type matches selected matched methods when the branch is in subset mode

Matched method values include:

  • exact
  • name
  • distance_matching_trio
  • distance_matching_1
  • long_distance_group_proximity
  • distance_matching_2
  • distance_matching_3a
  • distance_matching_3b
  • route_gtfs_tokens
  • route_gtfs_direction

Prefix matching is used for distance and route method families. For example, distance_matching_1 matches concrete import values such as distance_matching_1_uic_ref, and distance_matching_3a also includes distance_matching_3a_second_pass.

Special case: distance_matching_trio also includes effectively_matched trio-middle rows, because those rows have no direct match_type but are treated as matched for map and stats semantics.

1.2 ATLAS Unmatched Branch

ATLAS unmatched entries also support a parent state plus sub-filters.

Semantics:

  • ATLAS unmatched means all atlas-unmatched rows are included
  • selecting No OSM < 50m and/or OSM < 50m means only those unmatched reasons are included
  • if both unmatched reasons are selected, the UI rolls up to the parent state all

Formally:

  • AtlasUnmatchedBranch = stop_type = atlas_unmatched when the parent is all
  • AtlasUnmatchedBranch = stop_type = atlas_unmatched AND unmatched_reason IN selected reasons when the branch is in subset mode

The unmatched reason mapping is:

  • No OSM < 50m -> match_type = no_nearby_counterpart
  • OSM < 50m -> match_type != no_nearby_counterpart OR match_type IS NULL

1.3 OSM Unmatched Branch

This branch is explicit only.

Semantics:

  • if OSM unmatched is checked, all osm_unmatched rows are included
  • if it is not checked, OSM-unmatched rows are not added implicitly by any other selection

Formally:

  • OsmUnmatchedBranch = stop_type = osm_unmatched

2. Search Predicate

Search tokens are OR-combined within the search group.

SearchPredicate = token_1 OR token_2 OR token_3 ...

Meaning:

  • if multiple search tokens are active, a row matches if it matches any of them

2.1 Accepted Formats

The search input (#smartSearchInput) accepts the following formats, parsed by parseSmartSearchInput() in filters.js:

Format Example Token kind Backend identifier_type
UIC station code (starts with 85) 8503000 station station
ATLAS SLOID ch:1:sloid:3000:3 atlas sloid
OSM node ID (digits only) 123456789 osm osm_node_id
Route ID (dash-separated or route: prefixed) 11-T-j25-1 route route
Route + direction 11-T-j25-1 dir:0 route route

Unrecognized input shows the accepted formats hint tooltip (#smartSearchHint).

OSM node IDs may also be entered as node/123456789. Route input accepts dir:0 or dir:1; the token stores the direction for subsequent filtered requests, while the initial centering lookup validates the route ID itself.

2.2 Search Flow

  1. User submits input via Enter key
  2. parseSmartSearchInput() classifies the value into a token kind or returns a validation error
  3. addSearchToken() adds the token to activeFilters.station and calls fetchAndCenterSpecificStop()
  4. fetchAndCenterSpecificStop() calls /api/stop_by_id with the identifier and type
  5. On success: the map centers on the result and filters update
  6. On failure: the token is reverted and an error is shown

2.3 Not-Found Feedback

When a correctly formatted input does not match any database entry, /api/stop_by_id returns a 404. The frontend displays an error in the #smartSearchError element, styled via .smart-search-feedback in index.css.

The error message follows the pattern: No {type} found matching: {identifier}, where {type} is the human-readable token kind (e.g. "OSM node", "UIC station", "ATLAS SLOID").

3. ATLAS Predicate

ATLAS-side attributes are OR-combined within the ATLAS predicate group.

AtlasPredicate = atlas_attribute_1 OR atlas_attribute_2 OR ...

Current ATLAS attribute values:

  • ATLAS operator

Semantics:

  • ATLAS predicates are evaluated on every row that has an ATLAS side
  • matched rows can satisfy ATLAS predicates
  • ATLAS-unmatched rows can satisfy ATLAS predicates
  • OSM-unmatched rows naturally do not satisfy ATLAS predicates because they have no ATLAS side

4. OSM Predicate

The OSM predicate is composed of four subgroups that are AND-combined:

OsmPredicate = TransportPredicate AND EntityPredicate AND OsmOperatorPredicate AND OsmGroupPredicate

4.1 Transport Predicate

Transport types are OR-combined.

TransportPredicate = transport_type_1 OR transport_type_2 OR ...

Examples:

  • ferry_terminal
  • tram_stop
  • station
  • platform
  • stop_position
  • aerialway_station

4.2 Entity Predicate

OSM entity types (nodes vs ways) are OR-combined.

EntityPredicate = entity_type_1 OR entity_type_2 OR ...

Examples:

  • way (OSM entries derived from ways, identified by a way_ prefix in their ID)

4.3 OSM Operator Predicate

OSM stop operators are OR-combined within their own subgroup.

OsmOperatorPredicate = osm_operator_1 OR osm_operator_2 OR ...

The predicate is implemented through OsmNode.osm_operator and is AND-combined with the other OSM-side subgroups.

4.4 OSM Group Predicate (Pairs/Trios)

OSM group types are OR-combined. In current terminology, OSM group means OSM pair or OSM trio.

OsmGroupPredicate = group_type_1 OR group_type_2 OR ...

Examples:

  • osm_pair_uic
  • osm_pair_uic_equal_15m
  • osm_pair_name
  • osm_pair_name_equal_15m
  • osm_pair_tram
  • osm_pair_tram_equal_15m
  • osm_trio

If the OSM groups master is selected with no subtype refinement, the system treats it as:

  • OsmGroupPredicate = group_member(any type)

If only osm_trio is selected, pair rows are excluded.

4.5 OSM-side Semantics

OSM predicates always apply to rows that have an OSM side.

This is an intentional product rule.

Consequences:

  • matched rows can satisfy OSM predicates
  • OSM-unmatched rows can satisfy OSM predicates
  • ATLAS-unmatched rows naturally do not satisfy OSM predicates because they have no OSM side

There is no separate applicability toggle for OSM predicates.

5. Duplicate Predicate

The currently exposed duplicate control is Duplicate ATLAS.

Semantics:

  • Duplicate ATLAS is an AND-filter on ATLAS duplicate-group membership:
    • row has representative_sloid set (non-representative member), OR
    • row is a representative referenced by at least one sibling (EXISTS atlas_stops WHERE representative_sloid = this.sloid)

Formally:

  • DuplicatePredicate = atlas_duplicate_member = true

Implementation note:

  • duplicate filtering is a data predicate
  • this predicate is applied server-side across /api/data, /api/top_matches, /api/random_stop, and /api/global_stats
  • whether both sides of a matched row are drawn is still a rendering decision, not a predicate

6. Top N Distances

Top N is not part of the canonical predicate formula for /api/data.

It is a special matched-only mode used by:

  • /api/top_matches
  • /api/random_stop
  • /api/global_stats

The UI shows the Top N control whenever matched scope exists, meaning either:

  • All Matched Stops is checked
  • or at least one matched sub-filter is selected

If matched scope disappears, Top N is automatically disabled.

Current endpoint behavior:

  • /api/top_matches is always matched-only and sorts by descending distance_m
  • /api/random_stop and /api/global_stats apply top_n by narrowing to matched rows with distance_m
  • the main map skips normal /api/data viewport loading while Top N is active
  • the Top N overlay request is issued by loadTopNMatches() when the matched parent scope is active; method-only matched scope is still honored by /api/random_stop and /api/global_stats

7. Low-Zoom Overview Mode

When the map is below the marker threshold and there are no active user filters, the UI switches to an overview mode:

  • stop_filter = atlas_unmatched
  • only the ATLAS side is rendered

This is a display optimization for low zoom, not part of the canonical predicate algebra.

As soon as any user filter is active, normal predicate semantics are used again.

8. Request Serialization Rules

The frontend sends only the filters that are semantically active.

Important examples:

  • All Matched Stops checked -> send stop_filter=matched
  • Exact checked without All Matched Stops -> send match_method=exact only
  • ATLAS unmatched checked -> send stop_filter=atlas_unmatched, omit unmatched reason refinements
  • No OSM < 50m checked without the parent -> send match_method=no_nearby_counterpart
  • OSM group subtypes selected -> send osm_group_types=subtype_1,subtype_2
  • OSM stop operator selected -> send osm_operator=operator_1,operator_2
  • OSM groups master selected with no subtype -> send osm_group_types=all
  • Duplicate ATLAS checked -> send show_duplicates_only=true

This keeps requests compact. The backend treats any matched-method selection as implying matched scope, so match_method=exact works even if stop_filter=matched is omitted.

9. Consistency Guarantees

The endpoints below share the same request parameter model for common filters:

  • /api/data
  • /api/global_stats
  • /api/random_stop
  • /api/top_matches

/api/data, /api/global_stats, and /api/random_stop use the shared scope helpers (resolve_stop_type_match_filters, build_stop_scope_condition) for stop-type and match-method semantics. /api/top_matches is a matched-only endpoint; it applies common attribute filters and optional matched-method filtering, then sorts by largest distance.

10. Worked Examples

Example 1:

(distance stage 1 OR atlas-unmatched OR osm-unmatched) AND operator=SBB AND duplicate_atlas

This means:

  • keep rows in any of those three scope branches
  • then require an ATLAS side with operator SBB
  • then require ATLAS duplicate-group membership

Example 2:

matched OR osm-unmatched plus platform

This means:

  • keep matched rows and OSM-unmatched rows in scope
  • then keep only those whose OSM side is platform

Example 3:

operator=SBB and tram_stop

This means:

  • require an ATLAS side satisfying SBB
  • require an OSM side satisfying tram_stop
  • in practice this mostly yields matched rows because both sides must exist

11. Global Stats Endpoint Semantics

/api/global_stats delegates scoped query building and aggregation logic to backend/services/global_stats.py.

The endpoint still follows the same predicate algebra defined in this document.

11.1 Shared Scope Semantics

/api/global_stats uses the same helper path as /api/data for scope selection:

  • resolve_stop_type_match_filters()
  • build_stop_scope_condition()
  • build_trio_middle_with_matched_side_condition()

This preserves the same trio-middle effective-match behavior across map rendering and global summary stats.

11.2 Effective Matched Semantics in Stats

For global stats aggregation, an internal effective_stop_type is computed:

  • rows with stop_type = matched are treated as matched
  • rows with stop_type = effectively_matched are also treated as matched

This is identical to the semantics already used for matched-scope filtering and avoids drift between counts and map behavior.

11.3 Global vs Viewport Scope

/api/global_stats is filter-scoped, not viewport-scoped.

  • it does not use min_lat, max_lat, min_lon, or max_lon
  • it summarizes the full filtered dataset
  • /api/data remains the viewport-scoped endpoint

11.4 Runtime Notes

The current implementation computes global stats directly from SQL for each request. There is no in-process LRU cache or cache-key canonicalization layer in backend/services/global_stats.py.

The frontend avoids stale UI updates by aborting an in-flight /api/global_stats request before starting the next one and by ignoring responses whose sequence number is no longer current.

Data update in progress
Elapsed: -- ETA: -- Phase: idle