SlideShare a Scribd company logo
1 of 116
Download to read offline
ETL and Event Sourcing
Integration Architecture: Best Practice and Case Study
Marc Siegel - Panorama Education - Wed Feb 6 2019
ETL pipelines from external systems
ETL and Event Sourcing
Prerequisite knowledge
Familiarity with traditional ETL architectures:
Software systems that Extract data from external systems,
Transform them, and Load the resulting data sets into internal
systems, most often relational databases
Dissatisfaction with traditional ETL architectures / curiosity to
learn about and consider an alternative architecture
ETL and Event Sourcing
What you’ll learn
How Event Sourcing can be applied to ETL
How Determinism can be a property of a system
Value of treating the Past as First Class
What is ETL?
ETL
In a nutshell
ETL
In a nutshell
External
System
ETL
Traditional ETL Process
Extract
In a nutshell
External
System
ETL
Traditional ETL Process
Extract Transform
In a nutshell
External
System
ETL
Traditional ETL Process
Extract Transform Load
In a nutshell
External
System
ETL
Traditional ETL Process
Extract Transform Load
Internal
Database
In a nutshell
External
System
ETL
Traditional ETL Process
Extract Transform Load
Internal
Database
In a nutshell
External
System
Q: What is the System of Record?
What is the Source of Truth?
ETL
In a nutshell
External
System
System of Record
The authoritative data source for a given
data element or piece of information (1)
ETL
Internal
Database
In a nutshell
Source of Truth
A trusted data source that gives a complete
picture of the data object as a whole (2)
ETL
Traditional ETL Process
Extract Transform Load
Internal
Database
In a nutshell
External
System
ETL Challenges
Operational
Domain Modelling
Selective Attention
ETL Challenges
Operational
Domain Modelling
Selective Attention
Must rerun long ETL job to test edge case
Missing Interests:
● Decoupling
ETL Challenges
Operational
Domain Modelling
Selective Attention
Must rerun long ETL job to test edge case
Running ETL job can overwrite history
Missing Interests:
● Decoupling
● Determinism
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL Challenges
ETL Challenges
Operational
Domain Modelling
Selective Attention
Must create one true schema to load into
Missing Interests:
● Decoupling (of each interpretation)
ETL Challenges
Operational
Domain Modelling
Selective Attention
Must create one true schema to load into
Tend toward lowest common denominator
OR superset of all external model features
Missing Interests:
● Decoupling (of each interpretation)
● Modeling State Explicitly
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL Challenges
ETL Challenges
Operational
Domain Modelling
Selective Attention
From Psychology: the act of focusing on a
particular object while ignoring irrelevant
information
→ Can’t re-interpret past extracts
Missing Interests:
● Past as First Class
ETL Problems
Awareness Tests
YouTube:
● Basketball
● Monkey
Business
How many passes did the team in white make?
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL Challenges
ETL Advantage
Not just problems. Positive trade-offs of ETL?
● Low Costs: Training, framing, explaining
○ Training: Low cost to train new engineers in ETL concepts
○ Framing: No requirement for explicit domain modeling
○ Explaining: Intuitive to explain to non-engineers
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL Challenges
What is ELT?
ETL
Traditional ETL Process
Extract Transform Load
Internal
Database
In a nutshell
External
System
ETL and ELT
Traditional ETL Process
Extract Transform Load
Internal
Database
External
System
ETL and ELT
EL Process
Extract
Traditional ETL Process
Extract Transform Load
Internal
Database
Load
External
System
ETL and ELT
EL Process
Extract
Data Lake
or Blob or
File Store
Traditional ETL Process
Extract Transform Load
Internal
Database
Load
External
System
ETL and ELT
EL Process
Extract
Data Lake
or Blob or
File Store
T Process
Do anything here! Many vendors
offering various solutions.
Traditional ETL Process
Extract Transform Load
Internal
Database
Load
External
System
ETL and ELT
EL Process
Extract
Data Lake
or Blob or
File Store
T Process(es)
Do anything here! Many vendors
offering various solutions.
Traditional ETL Process
Extract Transform Load
Internal
Database
Load
External
System
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL and ELT
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL and ELT
ETL and ELT
EL Process
Extract
Data Lake
or Blob or
File Store
T Process(es)
Do anything here! Many vendors
offering various solutions.
Traditional ETL Process
Extract Transform Load
Internal
Database
Load
External
System
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL and ELT
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL and ELT
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL and ELT
What is Event Sourcing?
ETL
Traditional ETL Process
Extract Transform Load
Internal
Database
In a nutshell
External
System
ETL and ELT
EL Process
Extract
Data Lake
or Blob or
File Store
T Process(es)
Do anything here! Many vendors
offering various solutions.
Traditional ETL Process
Extract Transform Load
Internal
Database
Load
External
System
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
TeTL Process
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
TeTL Process
Domain
Events
Tr
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
TeTL Process
Domain
Events
Tr Tr Lo
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
Read
Model
TeTL Process
Domain
Events
Tr Tr Lo
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
Read
Model(s)
TeTL Process(es)
Domain
Events
Tr Tr Lo
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
Read
Model(s)
TeTL Process(es)
Domain
Events
Tr Tr Lo
1) Decouple extractions 2) Source of Truth: the extracts 3) Deterministic transform: to events + to model
regular expression mnemonic: from /(ETL)/ to /E{1}T*L*/ ← Extract once, Transform & Load
Infinitely
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL, ELT, and Event Sourcing
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL, ELT, and Event Sourcing
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL, ELT, and Event Sourcing
ETL and Event Sourcing
EL Process
Ex
Traditional ETL Process
Extract Transform Load
Internal
Database
Lo
External
System
Immutable &
Sequential
Store
Read
Model(s)
TeTL Process(es)
Domain
Events
Tr Tr Lo
1) Decouple extractions 2) Source of Truth: the extracts 3) Deterministic transform: to events + to model
regular expression mnemonic: from /(ETL)/ to /E{1}T*L*/ ← Extract once, Transform & Load
Infinitely
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL, ELT, and Event Sourcing
Event Sourcing Challenge
Not just advantages. Negative trade-offs of ES?
● High Costs: Training, framing, explaining
○ Training: Higher cost to train new engineers in ES concepts
○ Framing: Requirement for (lots of) explicit domain modeling
○ Explaining: Not necessarily intuitive to explain to non-engineers
Interests and Positions
ETL ELT Event Sourcing
Decoupling
Determinism
Modeling State Explicitly
Past as First Class
Low Cost
ETL, ELT, and Event Sourcing
How does Event Sourcing work?
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Event Sourcing Basics
Events
State transitions are an important part of our problem space and
should be modeled within our domain.
Event Sourcing Basics
Events
State transitions are an important part of our problem space and
should be modeled within our domain.
Event Sourcing says all state is transient and you only store facts.
Event Sourcing Basics
Events
State transitions are an important part of our problem space and
should be modeled within our domain.
Event Sourcing says all state is transient and you only store facts.
Event: something that happened in the past; a fact; a state
transition.
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Read Models
student_id course_id grade
123 abc B+
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Read Models
student_id course_id grade
123 abc C
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Read Models
student_id course_id grade
123 abc A-
Event Sourcing Basics
Read Models
Event Sourcing takes the term Read Model from CQRS.
Event Sourcing Basics
Read Models
Event Sourcing takes the term Read Model from CQRS.
A Read Model is an interpretation of a sequence of events, that is
optimized for answering a given set of queries (reads).
Event Sourcing Basics
Read Models
Event Sourcing takes the term Read Model from CQRS.
A Read Model is an interpretation of a sequence of events, that is
optimized for answering a given set of queries (reads).
Read Models: are independent representations of state that we
deterministically regenerate from events using projections.
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Projections
def f(state, event)
state.where(
student_id: event.student_id,
course_id: event.course_id
).update(grade: event.grade)
end
student_id course_id grade
123 abc A-
Event Sourcing Basics
Projections
When we talk about Event Sourcing, current state is a left-fold of
previous behaviors.
Event Sourcing Basics
Projections
When we talk about Event Sourcing, current state is a left-fold of
previous behaviors.
We play back a stream of events, applying a function
f ( staten
, eventn
) -> staten+1
Event Sourcing Basics
Projections
When we talk about Event Sourcing, current state is a left-fold of
previous behaviors.
We play back a stream of events, applying a function
f ( staten
, eventn
) -> staten+1
Projection: a function through which we apply events in sequence
to deterministically derive the state of our application
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Projections
def f(state, event)
state.where(
student_id: event.student_id,
course_id: event.course_id
).update(grade: event.grade)
end
student_id course_id grade
123 abc A-
Read Models
Event Sourcing Basics
Review
Event: something that happened in the past; a fact; a state
transition.
Projection: a function through which we apply events in sequence
to deterministically derive the state of our application
Read Models: are independent representations of state that we
deterministically regenerate from events using projections.
Event Sourcing Basics
GradeCreated
student_id: 123
course_id: abc
grade: B+
GradeUpdated
student_id: 123
course_id: abc
grade: C
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Events
Projections
def f(state, event)
state.where(
student_id: event.student_id,
course_id: event.course_id
).update(grade: event.grade)
end
student_id course_id grade
123 abc A-
Read Models
Applying Event Sourcing to ETL
Applying Event Sourcing to ETL
Q: How to we get from ETL to explicitly modeled Domain Events?
Applying Event Sourcing to ETL
Q: How to we get from ETL to explicitly modeled Domain Events?
Immutable &
Sequential
Store
Read
Model(s)
TeTL Process(es)
Domain
Events
Tr Tr Lo
Applying Event Sourcing to ETL
Q: How to we get from ETL to explicitly modeled Domain Events?
A: Build an Observational Event Sourced system
Immutable &
Sequential
Store
Read
Model(s)
TeTL Process(es)
Domain
Events
Tr Tr Lo
Observations
student_id course_id grade
123 abc A-
Applying Event Sourcing to ETL
Domain Events
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Read Models
Applying Event Sourcing to ETL
Observational
When capturing observations of external systems using Event
Sourcing, the events in our domain are the observations we capture.
Applying Event Sourcing to ETL
Observational
When capturing observations of external systems using Event
Sourcing, the events in our domain are the observations we capture.
Transforming a sequence of observations into explicitly modeled
domain events is the first projection.
Applying Event Sourcing to ETL
Observational
When capturing observations of external systems using Event
Sourcing, the events in our domain are the observations we capture.
Transforming a sequence of observations into explicitly modeled
domain events is the first projection.
Observational: an Event Sourced system where the event history is
of captured observations, and all state is derived from them.
Observations
student_id course_id grade
123 abc A-
Applying Event Sourcing to ETL
Domain Events
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Read Models
Observations
student_id course_id grade
123 abc A-
Applying Event Sourcing to ETL
Domain Events
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Read Models
Immutable &
Sequential
Store
Observations
student_id course_id grade
123 abc A-
Applying Event Sourcing to ETL
Domain Events
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Read Models
Immutable &
Sequential
Store
TeTL Process(es)
Domain
Events
Tr
Observations
student_id course_id grade
123 abc A-
Applying Event Sourcing to ETL
Domain Events
GradeUpdated
student_id: 123
course_id: abc
grade: A-
Read Models
Immutable &
Sequential
Store
Read
Model(s)
TeTL Process(es)
Domain
Events
Tr Tr Lo
Case study: Event Sourcing ETL
Case study: Event Sourcing ETL
GradeUpdated
student_id: 1
date: Oct 11
course: Biology
grade: B-
GradeUpdated
student_id: 1
date: Oct 12
course: Biology
grade: B+
projection
observation events domain events
Case study: Event Sourcing ETL
GradeUpdated
student_id: 1
date: Oct 11
course: Biology
grade: B-
GradeUpdated
student_id: 1
date: Oct 12
course: Biology
grade: B+
projection
InProgressGrades
domain events
read models
Case study: Event Sourcing ETL
queried
InProgressGrades
read models
Case study: Event Sourcing ETL
Past as First Class
First
Later interpretation
Case study: Event Sourcing ETL
Past as First Class
First
Later interpretation
Case study: Event Sourcing ETL
Past as First Class
First
Later interpretation
Case study: Event Sourcing ETL
Determinism
Case study: Event Sourcing ETL
Determinism
● Read Models regenerated nightly from source of truth
○ Given the same history, we regenerate the same Read Models
Case study: Event Sourcing ETL
Determinism
● Read Models regenerated nightly from source of truth
○ Given the same history, we regenerate the same Read Models
● On-demand Read Model Comparison tool
○ Ensure no Read Model changes across larger code refactors
Case study: Event Sourcing ETL
Determinism
Read Model Comparison - Before and After Regeneration
Read Model DB Same DB, but later.Regenerations Run
Clone Read Model Clone Read Model Again
batch_BEFORE batch_AFTER
Case study: Event Sourcing ETL
Determinism
Read Model Comparison - Before and After Regeneration
Read Model DB Same DB, but later.Regenerations Run
Case study: Event Sourcing ETL
Determinism
Read Model Comparison - Before and After Regeneration
Read Model DB Same DB, but later.Regenerations Run
Case study: Event Sourcing ETL
Trade-off: Investment in Training
Case study: Event Sourcing ETL
Trade-off: Investment in Training
● 5 x 1 hr training videos + 1 hr discussions = 10 hrs
Case study: Event Sourcing ETL
Trade-off: Investment in Training
● 5 x 1 hr training videos + 1 hr discussions = 10 hrs
● Gentle ramp up w/ pairing and joint designs (weeks)
Case study: Event Sourcing ETL
Trade-off: Investment in Training
● 5 x 1 hr training videos + 1 hr discussions = 10 hrs
● Gentle ramp up w/ pairing and joint designs (weeks)
● Set expectation that architecture will feel different
Lessons Learned
At the two year mark
● Lessons learned: Thinnest extractions possible
● Lessons learned: Extracted files as Source of Truth
● Lessons learned: Many iterations on transformations
● Lessons learned: Why TL must be fast and run often
Lessons Learned
At the two year mark
Lessons learned: Thinnest extractions possible
My first version of converting [one type of] XML to CSV was
silently dropping rows, and would have lost all that data if not
for the ability to replace from original extract.
Lessons Learned
At the two year mark
Lessons learned: Extracted files as Source of Truth
Real world example of changing incorrect foreign key reference
(which had been nearly all overlapping previously).
Lessons Learned
At the two year mark
Lessons learned: Many iterations on interpretations
Very natural to handle the changes, big and small, that appear in
the format and content of the data we have extracted. Also, new
features sometimes mean new or changed interpretations.
Lessons Learned
At the two year mark
Lessons learned: Why TL must be fast and run often
Consider the “nightly restores from backups” to prove that you
can actually restore from backups. This practice exists in our
application rather than our tools. If regeneration ever gets too
slow to complete overnight, we could lose this.
Summary and Review
What we covered
How Event Sourcing can be applied to ETL
How Determinism can be a property of a system
Value of treating the Past as First Class
Learn More
Resources
● DDD, CQRS, and Event Sourcing videos by Greg Young
● CQRS documentation site by Edument AB
● Domain Driven Design book by Eric Evans
Keep in touch!
● twitter: @ms_ati
● email: msiegel@panoramaed.com

More Related Content

What's hot

Datadog: From a single product to a growing platform by Alexis Lê-Quôc, CTO
Datadog: From a single product to a growing platform by Alexis Lê-Quôc, CTODatadog: From a single product to a growing platform by Alexis Lê-Quôc, CTO
Datadog: From a single product to a growing platform by Alexis Lê-Quôc, CTOTheFamily
 
Scaling and Modernizing Data Platform with Databricks
Scaling and Modernizing Data Platform with DatabricksScaling and Modernizing Data Platform with Databricks
Scaling and Modernizing Data Platform with DatabricksDatabricks
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 DataWorks Summit
 
Big data processing with Apache Spark and Oracle Database
Big data processing with Apache Spark and Oracle DatabaseBig data processing with Apache Spark and Oracle Database
Big data processing with Apache Spark and Oracle DatabaseMartin Toshev
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 
Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-bas...
Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-bas...Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-bas...
Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-bas...Willy Lulciuc
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드confluent
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasDataWorks Summit/Hadoop Summit
 
Introduction to hazelcast
Introduction to hazelcastIntroduction to hazelcast
Introduction to hazelcastEmin Demirci
 
Introduction to Azure Data Factory
Introduction to Azure Data FactoryIntroduction to Azure Data Factory
Introduction to Azure Data FactorySlava Kokaev
 
Introduction to Kibana
Introduction to KibanaIntroduction to Kibana
Introduction to KibanaVineet .
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL DatabasesDerek Stainer
 
What is Informatica Powercenter
What is Informatica PowercenterWhat is Informatica Powercenter
What is Informatica PowercenterBigClasses Com
 

What's hot (20)

Datadog: From a single product to a growing platform by Alexis Lê-Quôc, CTO
Datadog: From a single product to a growing platform by Alexis Lê-Quôc, CTODatadog: From a single product to a growing platform by Alexis Lê-Quôc, CTO
Datadog: From a single product to a growing platform by Alexis Lê-Quôc, CTO
 
Scaling and Modernizing Data Platform with Databricks
Scaling and Modernizing Data Platform with DatabricksScaling and Modernizing Data Platform with Databricks
Scaling and Modernizing Data Platform with Databricks
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Big data processing with Apache Spark and Oracle Database
Big data processing with Apache Spark and Oracle DatabaseBig data processing with Apache Spark and Oracle Database
Big data processing with Apache Spark and Oracle Database
 
ETL Technologies.pptx
ETL Technologies.pptxETL Technologies.pptx
ETL Technologies.pptx
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-bas...
Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-bas...Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-bas...
Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-bas...
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
 
What is ETL?
What is ETL?What is ETL?
What is ETL?
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
 
Introduction to hazelcast
Introduction to hazelcastIntroduction to hazelcast
Introduction to hazelcast
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
 
Introduction to Azure Data Factory
Introduction to Azure Data FactoryIntroduction to Azure Data Factory
Introduction to Azure Data Factory
 
Introduction to Kibana
Introduction to KibanaIntroduction to Kibana
Introduction to Kibana
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
What is Informatica Powercenter
What is Informatica PowercenterWhat is Informatica Powercenter
What is Informatica Powercenter
 

Similar to ETL and Event Sourcing

Proven ETL Developer Interview Questions to Assess and Hire ETL Developers
Proven ETL Developer Interview Questions to Assess and Hire ETL DevelopersProven ETL Developer Interview Questions to Assess and Hire ETL Developers
Proven ETL Developer Interview Questions to Assess and Hire ETL DevelopersInterview Mocha
 
Etl testing contents
Etl testing contentsEtl testing contents
Etl testing contentsManoj Jagtap
 
DMDW 7. Student Presentation - Pentaho Data Integration (Kettle)
DMDW 7. Student Presentation - Pentaho Data Integration (Kettle)DMDW 7. Student Presentation - Pentaho Data Integration (Kettle)
DMDW 7. Student Presentation - Pentaho Data Integration (Kettle)Johannes Hoppe
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit
 
What is an ETL plan that Ralph Kimball identifies from the 34 Subsyste.docx
What is an ETL plan that Ralph Kimball identifies from the 34 Subsyste.docxWhat is an ETL plan that Ralph Kimball identifies from the 34 Subsyste.docx
What is an ETL plan that Ralph Kimball identifies from the 34 Subsyste.docxtodd471
 
NEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator PresentationNEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator Presentationaskankit
 
Data Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrongData Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrongMassimo Cenci
 
etl testing training in hyderabad.......
etl testing training in hyderabad.......etl testing training in hyderabad.......
etl testing training in hyderabad.......sowmyavibhin
 
Etl testing training institute in hyderabad
Etl testing training institute  in hyderabadEtl testing training institute  in hyderabad
Etl testing training institute in hyderabadswathi3zen
 
“Extract, Load, Transform,” is another type of data integration process
“Extract, Load, Transform,” is another type of data integration process“Extract, Load, Transform,” is another type of data integration process
“Extract, Load, Transform,” is another type of data integration processRashidRiaz18
 
Cs511 data-extraction
Cs511 data-extractionCs511 data-extraction
Cs511 data-extractionBorseshweta
 
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdfabhaybansal43
 
oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021ssuser8ccb5a
 
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxCERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxcamyla81
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLInside Analysis
 
ETL with WSO2 Enterprise Middleware Platform
ETL with WSO2 Enterprise Middleware Platform ETL with WSO2 Enterprise Middleware Platform
ETL with WSO2 Enterprise Middleware Platform WSO2
 

Similar to ETL and Event Sourcing (20)

Proven ETL Developer Interview Questions to Assess and Hire ETL Developers
Proven ETL Developer Interview Questions to Assess and Hire ETL DevelopersProven ETL Developer Interview Questions to Assess and Hire ETL Developers
Proven ETL Developer Interview Questions to Assess and Hire ETL Developers
 
Etl testing contents
Etl testing contentsEtl testing contents
Etl testing contents
 
DMDW 7. Student Presentation - Pentaho Data Integration (Kettle)
DMDW 7. Student Presentation - Pentaho Data Integration (Kettle)DMDW 7. Student Presentation - Pentaho Data Integration (Kettle)
DMDW 7. Student Presentation - Pentaho Data Integration (Kettle)
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
Entity Framework 4
Entity Framework 4Entity Framework 4
Entity Framework 4
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
 
What is an ETL plan that Ralph Kimball identifies from the 34 Subsyste.docx
What is an ETL plan that Ralph Kimball identifies from the 34 Subsyste.docxWhat is an ETL plan that Ralph Kimball identifies from the 34 Subsyste.docx
What is an ETL plan that Ralph Kimball identifies from the 34 Subsyste.docx
 
Etl techniques
Etl techniquesEtl techniques
Etl techniques
 
NEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator PresentationNEOOUG 2010 Oracle Data Integrator Presentation
NEOOUG 2010 Oracle Data Integrator Presentation
 
Data Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrongData Warehouse - What you know about etl process is wrong
Data Warehouse - What you know about etl process is wrong
 
etl testing training in hyderabad.......
etl testing training in hyderabad.......etl testing training in hyderabad.......
etl testing training in hyderabad.......
 
Etl testing training institute in hyderabad
Etl testing training institute  in hyderabadEtl testing training institute  in hyderabad
Etl testing training institute in hyderabad
 
“Extract, Load, Transform,” is another type of data integration process
“Extract, Load, Transform,” is another type of data integration process“Extract, Load, Transform,” is another type of data integration process
“Extract, Load, Transform,” is another type of data integration process
 
LPR - Week 1
LPR - Week 1LPR - Week 1
LPR - Week 1
 
Cs511 data-extraction
Cs511 data-extractionCs511 data-extraction
Cs511 data-extraction
 
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
 
oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021
 
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxCERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
 
ETL with WSO2 Enterprise Middleware Platform
ETL with WSO2 Enterprise Middleware Platform ETL with WSO2 Enterprise Middleware Platform
ETL with WSO2 Enterprise Middleware Platform
 

Recently uploaded

Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfChristianCDAM
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxVelmuruganTECE
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxNiranjanYadav41
 
Crushers to screens in aggregate production
Crushers to screens in aggregate productionCrushers to screens in aggregate production
Crushers to screens in aggregate productionChinnuNinan
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Autonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptAutonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptbibisarnayak0
 

Recently uploaded (20)

Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdf
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptx
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptx
 
Crushers to screens in aggregate production
Crushers to screens in aggregate productionCrushers to screens in aggregate production
Crushers to screens in aggregate production
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Autonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.pptAutonomous emergency braking system (aeb) ppt.ppt
Autonomous emergency braking system (aeb) ppt.ppt
 

ETL and Event Sourcing

  • 1. ETL and Event Sourcing Integration Architecture: Best Practice and Case Study Marc Siegel - Panorama Education - Wed Feb 6 2019
  • 2. ETL pipelines from external systems
  • 3. ETL and Event Sourcing Prerequisite knowledge Familiarity with traditional ETL architectures: Software systems that Extract data from external systems, Transform them, and Load the resulting data sets into internal systems, most often relational databases Dissatisfaction with traditional ETL architectures / curiosity to learn about and consider an alternative architecture
  • 4. ETL and Event Sourcing What you’ll learn How Event Sourcing can be applied to ETL How Determinism can be a property of a system Value of treating the Past as First Class
  • 8. ETL Traditional ETL Process Extract In a nutshell External System
  • 9. ETL Traditional ETL Process Extract Transform In a nutshell External System
  • 10. ETL Traditional ETL Process Extract Transform Load In a nutshell External System
  • 11. ETL Traditional ETL Process Extract Transform Load Internal Database In a nutshell External System
  • 12. ETL Traditional ETL Process Extract Transform Load Internal Database In a nutshell External System Q: What is the System of Record? What is the Source of Truth?
  • 13. ETL In a nutshell External System System of Record The authoritative data source for a given data element or piece of information (1)
  • 14. ETL Internal Database In a nutshell Source of Truth A trusted data source that gives a complete picture of the data object as a whole (2)
  • 15. ETL Traditional ETL Process Extract Transform Load Internal Database In a nutshell External System
  • 16.
  • 18. ETL Challenges Operational Domain Modelling Selective Attention Must rerun long ETL job to test edge case Missing Interests: ● Decoupling
  • 19. ETL Challenges Operational Domain Modelling Selective Attention Must rerun long ETL job to test edge case Running ETL job can overwrite history Missing Interests: ● Decoupling ● Determinism
  • 20. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL Challenges
  • 21. ETL Challenges Operational Domain Modelling Selective Attention Must create one true schema to load into Missing Interests: ● Decoupling (of each interpretation)
  • 22. ETL Challenges Operational Domain Modelling Selective Attention Must create one true schema to load into Tend toward lowest common denominator OR superset of all external model features Missing Interests: ● Decoupling (of each interpretation) ● Modeling State Explicitly
  • 23. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL Challenges
  • 24. ETL Challenges Operational Domain Modelling Selective Attention From Psychology: the act of focusing on a particular object while ignoring irrelevant information → Can’t re-interpret past extracts Missing Interests: ● Past as First Class
  • 25. ETL Problems Awareness Tests YouTube: ● Basketball ● Monkey Business How many passes did the team in white make?
  • 26. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL Challenges
  • 27. ETL Advantage Not just problems. Positive trade-offs of ETL? ● Low Costs: Training, framing, explaining ○ Training: Low cost to train new engineers in ETL concepts ○ Framing: No requirement for explicit domain modeling ○ Explaining: Intuitive to explain to non-engineers
  • 28. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL Challenges
  • 29.
  • 31. ETL Traditional ETL Process Extract Transform Load Internal Database In a nutshell External System
  • 32. ETL and ELT Traditional ETL Process Extract Transform Load Internal Database External System
  • 33. ETL and ELT EL Process Extract Traditional ETL Process Extract Transform Load Internal Database Load External System
  • 34. ETL and ELT EL Process Extract Data Lake or Blob or File Store Traditional ETL Process Extract Transform Load Internal Database Load External System
  • 35. ETL and ELT EL Process Extract Data Lake or Blob or File Store T Process Do anything here! Many vendors offering various solutions. Traditional ETL Process Extract Transform Load Internal Database Load External System
  • 36. ETL and ELT EL Process Extract Data Lake or Blob or File Store T Process(es) Do anything here! Many vendors offering various solutions. Traditional ETL Process Extract Transform Load Internal Database Load External System
  • 37. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL and ELT
  • 38. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL and ELT
  • 39. ETL and ELT EL Process Extract Data Lake or Blob or File Store T Process(es) Do anything here! Many vendors offering various solutions. Traditional ETL Process Extract Transform Load Internal Database Load External System
  • 40. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL and ELT
  • 41. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL and ELT
  • 42. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL and ELT
  • 43. What is Event Sourcing?
  • 44. ETL Traditional ETL Process Extract Transform Load Internal Database In a nutshell External System
  • 45. ETL and ELT EL Process Extract Data Lake or Blob or File Store T Process(es) Do anything here! Many vendors offering various solutions. Traditional ETL Process Extract Transform Load Internal Database Load External System
  • 46. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System
  • 47. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store
  • 48. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store TeTL Process
  • 49. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store TeTL Process Domain Events Tr
  • 50. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store TeTL Process Domain Events Tr Tr Lo
  • 51. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store Read Model TeTL Process Domain Events Tr Tr Lo
  • 52. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store Read Model(s) TeTL Process(es) Domain Events Tr Tr Lo
  • 53. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store Read Model(s) TeTL Process(es) Domain Events Tr Tr Lo 1) Decouple extractions 2) Source of Truth: the extracts 3) Deterministic transform: to events + to model regular expression mnemonic: from /(ETL)/ to /E{1}T*L*/ ← Extract once, Transform & Load Infinitely
  • 54. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL, ELT, and Event Sourcing
  • 55. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL, ELT, and Event Sourcing
  • 56. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL, ELT, and Event Sourcing
  • 57. ETL and Event Sourcing EL Process Ex Traditional ETL Process Extract Transform Load Internal Database Lo External System Immutable & Sequential Store Read Model(s) TeTL Process(es) Domain Events Tr Tr Lo 1) Decouple extractions 2) Source of Truth: the extracts 3) Deterministic transform: to events + to model regular expression mnemonic: from /(ETL)/ to /E{1}T*L*/ ← Extract once, Transform & Load Infinitely
  • 58. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL, ELT, and Event Sourcing
  • 59. Event Sourcing Challenge Not just advantages. Negative trade-offs of ES? ● High Costs: Training, framing, explaining ○ Training: Higher cost to train new engineers in ES concepts ○ Framing: Requirement for (lots of) explicit domain modeling ○ Explaining: Not necessarily intuitive to explain to non-engineers
  • 60. Interests and Positions ETL ELT Event Sourcing Decoupling Determinism Modeling State Explicitly Past as First Class Low Cost ETL, ELT, and Event Sourcing
  • 61.
  • 62. How does Event Sourcing work?
  • 63. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events
  • 64. Event Sourcing Basics Events State transitions are an important part of our problem space and should be modeled within our domain.
  • 65. Event Sourcing Basics Events State transitions are an important part of our problem space and should be modeled within our domain. Event Sourcing says all state is transient and you only store facts.
  • 66. Event Sourcing Basics Events State transitions are an important part of our problem space and should be modeled within our domain. Event Sourcing says all state is transient and you only store facts. Event: something that happened in the past; a fact; a state transition.
  • 67. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events
  • 68. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events Read Models student_id course_id grade 123 abc B+
  • 69. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events Read Models student_id course_id grade 123 abc C
  • 70. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events Read Models student_id course_id grade 123 abc A-
  • 71. Event Sourcing Basics Read Models Event Sourcing takes the term Read Model from CQRS.
  • 72. Event Sourcing Basics Read Models Event Sourcing takes the term Read Model from CQRS. A Read Model is an interpretation of a sequence of events, that is optimized for answering a given set of queries (reads).
  • 73. Event Sourcing Basics Read Models Event Sourcing takes the term Read Model from CQRS. A Read Model is an interpretation of a sequence of events, that is optimized for answering a given set of queries (reads). Read Models: are independent representations of state that we deterministically regenerate from events using projections.
  • 74. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events Projections def f(state, event) state.where( student_id: event.student_id, course_id: event.course_id ).update(grade: event.grade) end student_id course_id grade 123 abc A-
  • 75. Event Sourcing Basics Projections When we talk about Event Sourcing, current state is a left-fold of previous behaviors.
  • 76. Event Sourcing Basics Projections When we talk about Event Sourcing, current state is a left-fold of previous behaviors. We play back a stream of events, applying a function f ( staten , eventn ) -> staten+1
  • 77. Event Sourcing Basics Projections When we talk about Event Sourcing, current state is a left-fold of previous behaviors. We play back a stream of events, applying a function f ( staten , eventn ) -> staten+1 Projection: a function through which we apply events in sequence to deterministically derive the state of our application
  • 78. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events Projections def f(state, event) state.where( student_id: event.student_id, course_id: event.course_id ).update(grade: event.grade) end student_id course_id grade 123 abc A- Read Models
  • 79. Event Sourcing Basics Review Event: something that happened in the past; a fact; a state transition. Projection: a function through which we apply events in sequence to deterministically derive the state of our application Read Models: are independent representations of state that we deterministically regenerate from events using projections.
  • 80. Event Sourcing Basics GradeCreated student_id: 123 course_id: abc grade: B+ GradeUpdated student_id: 123 course_id: abc grade: C GradeUpdated student_id: 123 course_id: abc grade: A- Events Projections def f(state, event) state.where( student_id: event.student_id, course_id: event.course_id ).update(grade: event.grade) end student_id course_id grade 123 abc A- Read Models
  • 82. Applying Event Sourcing to ETL Q: How to we get from ETL to explicitly modeled Domain Events?
  • 83. Applying Event Sourcing to ETL Q: How to we get from ETL to explicitly modeled Domain Events? Immutable & Sequential Store Read Model(s) TeTL Process(es) Domain Events Tr Tr Lo
  • 84. Applying Event Sourcing to ETL Q: How to we get from ETL to explicitly modeled Domain Events? A: Build an Observational Event Sourced system Immutable & Sequential Store Read Model(s) TeTL Process(es) Domain Events Tr Tr Lo
  • 85. Observations student_id course_id grade 123 abc A- Applying Event Sourcing to ETL Domain Events GradeUpdated student_id: 123 course_id: abc grade: A- Read Models
  • 86. Applying Event Sourcing to ETL Observational When capturing observations of external systems using Event Sourcing, the events in our domain are the observations we capture.
  • 87. Applying Event Sourcing to ETL Observational When capturing observations of external systems using Event Sourcing, the events in our domain are the observations we capture. Transforming a sequence of observations into explicitly modeled domain events is the first projection.
  • 88. Applying Event Sourcing to ETL Observational When capturing observations of external systems using Event Sourcing, the events in our domain are the observations we capture. Transforming a sequence of observations into explicitly modeled domain events is the first projection. Observational: an Event Sourced system where the event history is of captured observations, and all state is derived from them.
  • 89. Observations student_id course_id grade 123 abc A- Applying Event Sourcing to ETL Domain Events GradeUpdated student_id: 123 course_id: abc grade: A- Read Models
  • 90. Observations student_id course_id grade 123 abc A- Applying Event Sourcing to ETL Domain Events GradeUpdated student_id: 123 course_id: abc grade: A- Read Models Immutable & Sequential Store
  • 91. Observations student_id course_id grade 123 abc A- Applying Event Sourcing to ETL Domain Events GradeUpdated student_id: 123 course_id: abc grade: A- Read Models Immutable & Sequential Store TeTL Process(es) Domain Events Tr
  • 92. Observations student_id course_id grade 123 abc A- Applying Event Sourcing to ETL Domain Events GradeUpdated student_id: 123 course_id: abc grade: A- Read Models Immutable & Sequential Store Read Model(s) TeTL Process(es) Domain Events Tr Tr Lo
  • 93. Case study: Event Sourcing ETL
  • 94. Case study: Event Sourcing ETL GradeUpdated student_id: 1 date: Oct 11 course: Biology grade: B- GradeUpdated student_id: 1 date: Oct 12 course: Biology grade: B+ projection observation events domain events
  • 95. Case study: Event Sourcing ETL GradeUpdated student_id: 1 date: Oct 11 course: Biology grade: B- GradeUpdated student_id: 1 date: Oct 12 course: Biology grade: B+ projection InProgressGrades domain events read models
  • 96. Case study: Event Sourcing ETL queried InProgressGrades read models
  • 97. Case study: Event Sourcing ETL Past as First Class First Later interpretation
  • 98. Case study: Event Sourcing ETL Past as First Class First Later interpretation
  • 99. Case study: Event Sourcing ETL Past as First Class First Later interpretation
  • 100. Case study: Event Sourcing ETL Determinism
  • 101. Case study: Event Sourcing ETL Determinism ● Read Models regenerated nightly from source of truth ○ Given the same history, we regenerate the same Read Models
  • 102. Case study: Event Sourcing ETL Determinism ● Read Models regenerated nightly from source of truth ○ Given the same history, we regenerate the same Read Models ● On-demand Read Model Comparison tool ○ Ensure no Read Model changes across larger code refactors
  • 103. Case study: Event Sourcing ETL Determinism Read Model Comparison - Before and After Regeneration Read Model DB Same DB, but later.Regenerations Run Clone Read Model Clone Read Model Again batch_BEFORE batch_AFTER
  • 104. Case study: Event Sourcing ETL Determinism Read Model Comparison - Before and After Regeneration Read Model DB Same DB, but later.Regenerations Run
  • 105. Case study: Event Sourcing ETL Determinism Read Model Comparison - Before and After Regeneration Read Model DB Same DB, but later.Regenerations Run
  • 106. Case study: Event Sourcing ETL Trade-off: Investment in Training
  • 107. Case study: Event Sourcing ETL Trade-off: Investment in Training ● 5 x 1 hr training videos + 1 hr discussions = 10 hrs
  • 108. Case study: Event Sourcing ETL Trade-off: Investment in Training ● 5 x 1 hr training videos + 1 hr discussions = 10 hrs ● Gentle ramp up w/ pairing and joint designs (weeks)
  • 109. Case study: Event Sourcing ETL Trade-off: Investment in Training ● 5 x 1 hr training videos + 1 hr discussions = 10 hrs ● Gentle ramp up w/ pairing and joint designs (weeks) ● Set expectation that architecture will feel different
  • 110. Lessons Learned At the two year mark ● Lessons learned: Thinnest extractions possible ● Lessons learned: Extracted files as Source of Truth ● Lessons learned: Many iterations on transformations ● Lessons learned: Why TL must be fast and run often
  • 111. Lessons Learned At the two year mark Lessons learned: Thinnest extractions possible My first version of converting [one type of] XML to CSV was silently dropping rows, and would have lost all that data if not for the ability to replace from original extract.
  • 112. Lessons Learned At the two year mark Lessons learned: Extracted files as Source of Truth Real world example of changing incorrect foreign key reference (which had been nearly all overlapping previously).
  • 113. Lessons Learned At the two year mark Lessons learned: Many iterations on interpretations Very natural to handle the changes, big and small, that appear in the format and content of the data we have extracted. Also, new features sometimes mean new or changed interpretations.
  • 114. Lessons Learned At the two year mark Lessons learned: Why TL must be fast and run often Consider the “nightly restores from backups” to prove that you can actually restore from backups. This practice exists in our application rather than our tools. If regeneration ever gets too slow to complete overnight, we could lose this.
  • 115. Summary and Review What we covered How Event Sourcing can be applied to ETL How Determinism can be a property of a system Value of treating the Past as First Class
  • 116. Learn More Resources ● DDD, CQRS, and Event Sourcing videos by Greg Young ● CQRS documentation site by Edument AB ● Domain Driven Design book by Eric Evans Keep in touch! ● twitter: @ms_ati ● email: msiegel@panoramaed.com