6.3 Background Scheduler

The Background Scheduler is a dedicated service responsible for orchestrating the end-to-end data pipeline. It ensures that the ATLAS and OSM datasets are periodically synchronized, matched, and imported into the database without manual intervention.

Core Role

The scheduler automates the transition through the four main phases of the project:

1. Download and Process Data: Fetching official ATLAS exports and OSM overpass data.
2. Matching Process: Running the multi-stage geospatial association logic.
Problem detection (3. Problems): Identifying data quality issues.
4.1 Import Process: Rebuilding the import_db with fresh results.

Implementation Details

Service Architecture

The scheduler is implemented as an APScheduler (BlockingScheduler) instance running within a dedicated Docker container (scheduler).

Entrypoint: matching_and_import_db/scheduler/service.py
Logic Runner: matching_and_import_db/scheduler/job_runner.py

Redis Integration & Locking

To ensure system stability, the scheduler interacts with Redis for two critical functions:

Distributed Lock: Before starting a run, the scheduler attempts to acquire a pipeline_lock in Redis. This prevents multiple triggers (e.g., a scheduled task and a manual docker exec) from running simultaneously and corrupting the data.
Status Reporting: The scheduler publishes its current state (e.g., downloading, matching, importing) to Redis. The Flask web application consumes this data to display a real-time progress bar and status message to users.

Configuration

The scheduler's behavior is controlled via environment variables in the scheduler service:

Variable	Description	Default
`PIPELINE_SCHEDULE_INTERVAL_HOURS`	Interval between automatic runs, in hours	`24`
`PIPELINE_TIMEZONE`	Timezone used when computing the next run timestamp	`Europe/Zurich`
`PIPELINE_LOG_LEVEL`	Verbosity of the pipeline logs	`INFO`

Operational Commands

Manual Trigger

You can force a pipeline run immediately by executing the job runner inside the running scheduler container:

docker compose exec scheduler python -m matching_and_import_db.scheduler.job_runner --mode full --trigger manual

Checking Status

The status can be checked via the API endpoint:
GET /api/system/pipeline_status

Error Handling

If a phase fails (e.g., a network timeout during OSM download), the scheduler:

Logs the traceback to stdout.
Updates the Redis status to failure with the error message.
Releases the distributed lock so subsequent runs can still execute.
Retains the old database state (since the import phase is only reached after successful matching).

6.2 Security And Rate Limits

6.4 Deployment & Github Actions

Generating Report

Documentation

6.3 Background Scheduler

Core Role

Implementation Details

Service Architecture

Redis Integration & Locking

Configuration

Operational Commands

Manual Trigger

Checking Status

Error Handling