Implementing a State-Aware ELT Pipeline: From MySQL Silos to dbt Marts

TokorapiUnified Data Strategy
Metabase dashboard for e-commerce Website

Data Latency

Reduced from 24+ hours to nightly 2 AM refresh

Engineering Overhead

90% reduction in manual SQL maintenance

Data Reliability

100% schema validation via dbt tests

The Challenge

The client relied on a single, heavily burdened MySQL production instance. Direct analytical queries were causing performance degradation. There was no staging environment, meaning raw data was being transformed in-memory, leading to high failure rates and zero auditability.

Technology Stack

AirbyteGoogle BigQuerydbtDagsterMetabaseMySQL

Methodologies

ELTStar Schema ModelingData OrchestrationAutomated Testing

The Outcome

Ingestion: Configured Airbyte to sync from the MySQL source to a BigQuery raw schema, isolating analytical workloads from production.

Transformation: Developed a three-tier dbt architecture (raw → staging → marts). Implemented fct_orders and dim_customer tables to optimize query performance in Metabase.

Orchestration: Deployed Dagster to manage the end-to-end lineage. The pipeline is scheduled for 2 AM daily, ensuring fresh data is available for business hours while minimizing compute costs.

Intelligence: Metabase dashboards now query optimized mart tables, resulting in sub-second load times for complex regional revenue views.