🧠 Core Concepts

This section introduces the core ideas behind ADL (Automated Data Loader) to help you quickly understand how data moves through the system and where plugins, models, and admin screens fit in.


1) The Big Picture

ADL is a Django-based platform that ingests observations from many upstream sources, normalizes them into a common schema, and makes them available for analysis, visualization, and downstream dispatch (e.g., WIS2).

At a high level:

  1. Plugins know how to talk to external systems (APIs, files, brokers) and produce time-stamped records.

  2. Core models (Station, Parameter, Unit, ObservationRecord, etc.) provide a canonical structure in the database.

  3. Schedulers (Celery Beat/Worker) invoke plugins on intervals, handle batches, and support backfills.

  4. Dispatch channels can publish/forward curated data to external targets (e.g., WIS2Box).

        flowchart LR
  subgraph Upstream
    A[Source API/Feed]
  end

  subgraph ADL Core
    P[Plugin] --> N[Normalization Unit & Mapping] --> S[Observation Record TimescaleDB]
    S --> A1[Aggregations]
    S --> Q[Query Admin UI]
  end

  subgraph Downstream
    D[Dispatch Channel e.g. WIS2Box]
  end

  A --> P
  A1 --> D
    

2) Core Data Model

Understanding a few key models clarifies how ADL thinks about networks and measurements.

Network

Represents a family of stations (e.g., “ADCON Automatic Weather Stations”). Purely organizational.

Station

A physical observing point (with location, identifiers such as WIGOS, heights, metadata). Stations belong to a Network.

NetworkConnection

Configuration for a specific upstream integration (credentials, cadence, timezone defaults, batch size, daily vs hourly data, etc.). A NetworkConnection points to a Plugin type that will perform the actual collection.

Unit & DataParameter

  • Unit defines canonical unit symbols (backed by a unit registry).

  • DataParameter names a measurable variable (e.g., air_temperature) and which Unit ADL uses as canonical. It also supports optional conversion contexts for tricky cases (e.g., precipitation mass/area equivalence).

ObservationRecord

The atomic measurement in ADL, keyed by (time, station, connection, parameter) with a value. Stored in TimescaleDB for efficient time-series operations. Flags whether a record is daily (is_daily).

Aggregations (HourlyObsAgg)

A view for hourly summaries (min/max/avg/sum, counts). Used by analytics and dispatch.


3) Plugin Architecture (Extension Point)

Plugins are Django apps that extend the Plugin base class and register themselves in the plugin registry. The base class provides:

  • Date-window helpers: Picks [start_date, end_date) for each station (resuming from last saved data if present).

  • Normalization & saving: Converts source values to canonical units and upserts ObservationRecords.

  • Orchestration: Iterates station links, handles disabled stations, and returns per-station counts.

Variable Mapping

Each StationLink provides a mapping from source field name & unitADL DataParameter. Example: temp_K ( Kelvin) → air_temperature (Celsius). This drives conversion and saving.


4) Time, Timezones & Windows

Golden rules:

  • Plugins work in the station’s timezone; the database stores UTC.

  • Default window is the previous hour up to the top of the next hour (closed-open [start, end)).

  • If past data exists in ADL ObservationRecords table, ADL resumes from the latest saved observation time.

  • If no database records, ADL may use a station-defined first collection date or the default previous-hour window.

  • For daily feeds, the NetworkConnection.is_daily_data flag marks saved rows accordingly.

Naive vs Aware datetimes

  • If a plugin returns naive observation_time, ADL interprets it as station-local and makes it aware.

  • If the plugin returns aware datetimes (UTC or otherwise), ADL converts them to station-local without shifting the instant.


5) Scheduling & Execution

  • Celery Beat triggers periodic runs (e.g., every 15 minutes) according to plugin_processing_interval on the NetworkConnection.

  • Celery Workers execute the fetch/save jobs and can scale horizontally.

  • Manual runs can be initiated via NetworkConnection.collect_data() (useful for backfills or debugging).

  • Plugins should respect upstream rate limits and add retry/backoff for transient failures.


6) Dispatch Channels (Downstream)

After ingestion, ADL can publish data to external systems via Dispatch Channels. A channel selects parameters, optionally uses aggregations (e.g., hourly), and pushes records onward. Example: Wis2BoxUpload connects to a WIS2 storage endpoint and uploads observations.

Dispatch entities:

  • DispatchChannel: base configuration (enabled, interval, optional start date, aggregated vs raw).

  • DispatchChannelParameterMapping: maps ADL parameters into the channel’s expected field names/units and selects the aggregation measure (avg/sum/min/max).

  • Station selection: choose which stations are allowed for the channel.


7) Observability & Operations

  • Logging: Plugins and core components log key events (window bounds, station ids, counts, warnings on validation).

  • Idempotency: Upserts based on the unique key avoid duplicates; late data updates the same unique row.

  • Troubleshooting: Check plugin logs for window computations, mapping issues, and unit conversion warnings.

Operational tips:

  • Use conservative timeouts and a requests retry adapter in API clients.

  • For performance: paginate upstream requests, stream/process in chunks, rely on TimescaleDB indexes for readbacks.


8) Security & Configuration

  • Access control is managed by Django/Wagtail admin permissions; admin-only URLs for plugin helpers (widgets, metadata views) should remain authenticated.

  • Licensing & data policy: Respect source terms; document attribution requirements in plugin README.


9) Developer Workflow

  1. Create a plugin from the boilerplate (cookiecutter) or copy the sample.

  2. Implement get_station_data and register the plugin in AppConfig.ready().

  3. Add models for NetworkConnection and StationLink as needed; create migrations.

  4. Add admin widgets/views only if they improve operator UX.

  5. Run locally with Docker Compose (ADL core + your plugin) and iterate.


10) Glossary

  • Plugin: A Django app implementing get_station_data to fetch and normalize upstream data.

  • Network: A grouping of stations.

  • Station: An observing site (location/metadata).

  • NetworkConnection: Configuration for a specific upstream integration; references a Plugin type.

  • StationLink: Binds a Station to a NetworkConnection and adds per-station connection details.

  • DataParameter: Canonical variable name with a canonical Unit.

  • Unit: Measurement unit used by ADL; convertible via the unit registry.

  • ObservationRecord: The core time-series row (time, station, connection, parameter, value).

  • Dispatch Channel: A mechanism to publish records to external systems.


11) Mental Model (TL;DR)

  • A Connection + StationLinks define what to pull and from where.

  • A Plugin defines how to pull and how to map the fields.

  • ADL chooses a time window, the plugin returns rows, and ADL upserts them.

  • Optional Dispatch moves curated data out to other systems.

If you understand the Connection → StationLink → Plugin → ObservationRecord chain, you understand ADL.