The availability of new tools in the modern data stack is changing the way data teams operate. Specifically, the modern data stack supports an “ELT” approach for managing data, rather than the traditional “ETL” approach. In an ELT approach, data sources are automatically loaded in a normalized state into Delta Lake and opinionated transformations happen in the data destination using dbt. This workflow allows data analysts to move more quickly from raw data to insight, while creating repeatable data pipelines robust to changes in the source datasets. In this presentation, we’ll illustrate how easy it is for even a data analytics team of one to to develop an end-to-end data pipeline. We’ll load data from GitHub into Delta Lake, then use pre-built dbt models to feed a daily Redash dashboard on sales performance by manager, and use the same transformed models to power the data science team’s predictions of future sales by segment.
3. In 2010, the focus of many analytics teams was on
infrastructure, compute and storage for large data sets
● “How do we build / scale our ETL
infrastructure?”
● “How will we control storage
costs?”
● “How will we design a data
warehouse that is performant?”
“Big Data”
4. The landscape has shifted
● Constraining compute (On-
premises)
● Perpetual licensing
● ETL outside the database
● Separation of storage +
compute (on-premise)
● Subscription licensing
● Complex pipelines required
for large datasets
Single Node Databases Scalable Databases
● Auto scale cloud DBMS
● Usage based licensing
● Separation of storage and
compute allows for agile
data pipelines
Modern Cloud
Destinations
Pipeline architecture is adapting to the modern cloud technology.
5. ETL ELT
Sources
Transformation (Modeling)
Process
Raw DataExtraction Process Clean, Usable Datasets
Moving Data Into the Warehouse (EL) is a
highly automatable process.
Data Transformation is different for
every company - it cannot be fully
automated.
7. Moving the data transformation step to the
warehouse has another benefit - democratizing
access to data.
The data transformation step is now more accessible to more
members of the data team - from BI developers to data
scientists.
No more waiting for an ETL process that happens out of sight.
8. ➔Automatic Data
Updates (DML)
➔Automatic Schema
Migrations (DDL)
➔Automated Recovery
from Failure
(Idempotent)
➔Micro-batched
architecture
➔Extensible Cloud
Functions
About Fivetran
9. About dbt
dbt is an open source transform tool that allows
anyone comfortable with SQL to author their own data
pipelines
10. ● Users write SQL with “super powers” from python (e.g., loops, macros,
local variables).
● Wraps the right DDL, DML around your SQL to materialize data
models in any warehouse.
● Infers the data lineage graph (DAG) as you code.
● Supports multiple environments, and git-based version control.
● Integrates testing into your pipeline.
● Automates documentation
dbt allows data engineers/analysts to work like
software developers
11. The modern data stack - with Fivetran, Delta Lake,
Databricks SQL Service and dbt
● Fivetran automates the integration with operational
systems with zero configuration required
● dbt provides a flexible data modeling environment with
best practices from DevOps.
● Connecting directly to the new databricks SQL service
makes building your pipeline fast and easy.
12. 1. Ingest data via a Fivetran automatic connector.
2. Connect dbt to start transforming your data in-
warehouse.
3. Use a a Fivetran dbt package to jump-start your
modeling process.
Speeding time to insight with the modern stack
18. Reduce Long Titles
▪ Bullet 1
▪ Sub-bullet
▪ Sub-bullet
▪ Bullet 2
▪ Sub-bullet
▪ Sub-bullet
By splitting them into a short title, and a more detailed subtitle using this slide format
that includes a subtitle area
20. Two Columns
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
▪ Bulleted list format
Headline FormatHeadline Format
21. Attribution Format
Second line of attribution
This is a template for a quote slide.
This is where the quote goes.
Attribute the source below…