4.1 Import Process
The database import process transforms processed CSV files and match outputs into the PostgreSQL database that powers the web application. We use PostgreSQL 15+ with PostGIS 3.x for spatial queries.
High-Level Architecture
Table Behavior
| Category | Database | Tables | Behavior |
|---|---|---|---|
| Import | import_db |
stops_matched, atlas_stops, osm_nodes, osm_stops, osm_stop_members, atlas_routes, atlas_route_directions, osm_routes, osm_route_tags, route_atlas_stops, route_osm_stops, routes_matched, route_problems, problems |
Truncated and repopulated on every run |
The database is intentionally reproducible from source files. Every import run wipes current import data to ensure consistency with the latest processed pipeline output.
Import Steps
The import_to_database() function in matching_and_import_db/database/importer.py orchestrates the import.
1. Schema Update
ensure_schema_updated() # Runs Alembic migrations to HEAD
Pending Alembic migrations are applied first so the database schema matches the current ORM models.
2. Truncate Import Tables
session.execute(text("TRUNCATE TABLE atlas_stops, osm_nodes, osm_stops, osm_stop_members, route_atlas_stops, route_osm_stops CASCADE"))
session.execute(text("TRUNCATE TABLE routes_matched CASCADE"))
session.execute(text("TRUNCATE TABLE problems, stops_matched CASCADE"))
The importer clears the import tables up front to guarantee a fresh state for the new run.
3. Load Route Mappings
The route loader preloads all route CSVs in one pass:
atlas_routes.csvatlas_route_directions.csvatlas_route_stops.csvosm_routes.csvosm_route_tags.csvosm_route_members.csv
This reduces repeated file I/O and prepares entity-first payloads for route insertion.
4. Insert Entity Data
The importer writes data in this order:
- Import all OSM nodes into
osm_nodesso dependent route-stop rows can resolve their stop members. - Import OSM stop units into
osm_stops. - Import OSM stop membership into
osm_stop_members. - Import matched stops into
stops_matched, creatingatlas_stopsrows on demand and evaluating matched-stop problems. - Import unmatched ATLAS stops into
stops_matched, creatingatlas_stopsrows on demand and evaluating unmatched-stop problems. - Import unmatched OSM stops into
stops_matchedand evaluate unmatched-stop problems. - Import route entities and route membership into
atlas_routes,atlas_route_directions,osm_routes,osm_route_tags,route_atlas_stops,route_osm_stops,routes_matched, androute_problems.
Records are accumulated in arrays and committed in batches. The importer uses batched persistence plus periodic session.commit() checkpoints to keep imports fast without holding the entire run in a single transaction.
export DB_IMPORT_BATCH_SIZE=5000
Progress is logged with timing information during long imports.
5. Detect Problems
A ProblemContext is built once from the pipeline output, precomputing structures such as KDTrees, UIC counts, and duplicate maps. Then run_problem_pipeline(STOP_PROBLEM_PIPELINE, ctx, stop) is called for each stop.
The stop-level predicates currently include:
distance_problemattributes_problemduplicates_problemunmatched_problem
Each predicate yields zero or more ProblemResult objects with a priority score, which are then converted into ORM Problem rows. See 3.1 Stop Problems for the stop-problem rules.
Route-level problems are computed separately during route payload construction and persisted to route_problems.
Running the Import
Environment Variables
| Variable | Description | Default |
|---|---|---|
DATABASE_URI |
Import database connection string | postgresql+psycopg://stops_user:1234@localhost:5432/import_db |
DB_IMPORT_BATCH_SIZE |
Records per batch commit | 5000 |
Command
export DATABASE_URI="postgresql://user:pass@localhost/stopdb"
python matching_and_import_db/database/importer.py
# Or skip certain phases to focus on selected matching stages
python matching_and_import_db/database/importer.py --skip-phase1 --skip-phase2
Persistence Behavior
The current importer does not preserve user modifications between runs. Every import rebuilds the import database from source files.
Related Documentation
- 4. Database — Database architecture, schema, and indexing