Implementing a State-Aware ELT Pipeline: From MySQL Silos to dbt Marts

Data Latency
Reduced from 24+ hours to nightly 2 AM refresh
Engineering Overhead
90% reduction in manual SQL maintenance
Data Reliability
100% schema validation via dbt tests
The Challenge
The client relied on a single, heavily burdened MySQL production instance. Direct analytical queries were causing performance degradation. There was no staging environment, meaning raw data was being transformed in-memory, leading to high failure rates and zero auditability.
Technology Stack
Methodologies
The Outcome
Ingestion: Configured Airbyte to sync from the MySQL source to a BigQuery raw schema, isolating analytical workloads from production.
Transformation: Developed a three-tier dbt architecture (raw → staging → marts). Implemented fct_orders and dim_customer tables to optimize query performance in Metabase.
Orchestration: Deployed Dagster to manage the end-to-end lineage. The pipeline is scheduled for 2 AM daily, ensuring fresh data is available for business hours while minimizing compute costs.
Intelligence: Metabase dashboards now query optimized mart tables, resulting in sub-second load times for complex regional revenue views.