SlideShare a Scribd company logo
1 of 96
Download to read offline
Data Platform
Marquez:
A Metadata Service for Data Abstraction, Data Lineage,
and Event-based Triggers
DataEngConf NYC ‘18
Data Platform
Hey!
I’m Willy Lulciuc
Data Engineer
Marquez Team, Data Platform
@wslulciuc
Data Platform
Space01
Community02
Services03
Data Platform
268,000
members globally
287
physical locations
72
cities
23
countries
Data Platform
AGENDA
Room bookings pipeline (naïve)
Intro to Marquez
Room bookings pipeline (take 2)
02
03
04
@wslulciuc
Future work05
Why metadata?01
Why metadata?01
Data lineage
● Add context to
data
Democratize
● Self-service data
culture
Data quality
● Build trust in
data
Why manage and utilize metadata?
Data Platform
… creating a healthy data
ecosystem
Freedom
● Experiment
● Flexible
● Self-sufficient
Accountability
● Cost
● Trust
Self-service
● Discover
● Explore
● Global context
A healthy data ecosystem
Data Platform
Data Platform
Let’s get
booking!
Location + floor01
Data Platform
Data Platform
Location + floor01
Open time slot02
Data Platform
Location + floor01
Open time slot02
Duration03
Data Platform
Location + floor01
Open time slot02
Duration03
Confirm04
Which location has
the most bookings?
Data Platform
Set[RoomBooking] LocationID
Room bookings pipeline
(naïve)
02
Data Platform
@wslulciuc
Requirements
Example: Room bookings pipeline (naïve)
● Read room bookings
● Sum room bookings by location
● Write top location
● Run once an hour
Read SumStart Write
Data Platform
@wslulciucExample: Room bookings pipeline (naïve)
S3
Postgres
.csv
.csv
Data Platform
@wslulciucExample: Room bookings pipeline (naïve)
S3
Postgres
.csv
.csv
b940314,1541624285,2
TSLOCATION ROOM
b648485,1541501885,9
b648485,1541710685,4
Data Platform
@wslulciucExample: Room bookings pipeline (naïve)
S3
Postgres
.csv
.csv
b940314,1541624285,2
1 b648485 1541721600 2
TSLOCATION ROOM
LOCATIONID TS BOOKINGS
b648485,1541501885,9
b648485,1541710685,4
Data Platform
Example: Room bookings pipeline (naïve) @wslulciuc
Job
Scheduler
Upstream Downstream
S3 Postgres
Room Bookings
Job
Archival Top Locations
Workflow
We’re live!
Data Platform
@wslulciuc
Problems
● What’s our job’s input dataset?
● Does the dataset have an owner?
● How often is the dataset updated?
Example: Room bookings pipeline (naïve)
Data Platform
Example: Room bookings pipeline (naïve) @wslulciuc
Job
Scheduler
Upstream Downstream
S3 Postgres
Room Bookings
Job
Archival Top Locations
Workflow
Curses, our job’s failing …
Data Platform
Example: Room bookings pipeline (naïve) @wslulciuc
Job
Scheduler
Upstream Downstream
S3 Postgres
Room Bookings
Job
Archival Top Locations
Workflow
Oh, might be our input data!
Data Platform
@wslulciucExample: Room bookings pipeline (naïve)
S3
.csv
.csv
Room field is of type string
b648485,1541501885,9A
b940314,1541624285,2G
b648485,1541710685,4F
TSLOCATION ROOM
int
Data Platform
@wslulciuc
Problems
● What’s our job’s input dataset?
● Does the dataset have an owner?
● How often is the dataset updated?
● Coordinate changes
Example: Room bookings pipeline (naïve)
Data Platform
Example: Room bookings pipeline (naïve) @wslulciuc
Job
Scheduler
Upstream Downstream
S3 Postgres
Room Bookings
Job
Archival Top Locations
Workflow
Ugh, gaps in output data
Data Platform
Example: Room bookings pipeline (naïve) @wslulciuc
00h 01h 02h 03h 04h 05h 06h 07h 08h 09h
Backfills!
time partitions
latest
Data Platform
@wslulciuc
Problems
● What’s our job’s input dataset?
● Does the dataset have an owner?
● How often is the dataset updated?
● Coordinate changes
● Figure out backfills
Example: Room bookings pipeline (naïve)
Data Platform
@wslulciucExample: Room bookings pipeline (naïve)
Job
Scheduler
S3 Postgres
Room Bookings
Workflow
What we have so far …
Data Platform
@wslulciucExample: Room bookings pipeline (naïve)
Job
Scheduler
S3 Postgres
What we have so far … Problems
● What’s our job’s input
dataset?
● Does the dataset have
an owner?
● How often is the
dataset updated?
● Coordinate changes
● Figure out backfillsRoom Bookings
Workflow
… writing a job shouldn’t be
this hard!
Intro to Marquez04
Data Platform
Metadata Service
● Centralized metadata
management
○ Jobs
○ Datasets
● Modular
○ Data discovery
○ Data health
○ Data triggers
Marquez: Design @wslulciuc
Clients
(JVM)
Clients
(Python)
Marquez
Search
Health
Triggers
REST API
Data Platform
Module: Search
● Unified search
● Documentation
○ Owner
○ Schema
○ Datasource
@wslulciuc
Marquez
Search
Health
Triggers
Marquez: Data discovery
@wslulciucMarquez: Data discovery
room bo
Room Bookings (SF)
All
created: jul. 8, 2018
Room Booking Metrics (GLBL)
created: feb. 15, 2010
All San Francisco room bookings
Global room booking metrics
Search
Datasets
TagsS3
Data Platform
Module: Health
● Owner
○ Team / project
● Schema
● Location
● Description
● Size
○ Growth over time
○ Number of records
● Lineage
@wslulciuc
Marquez
Search
Health
Triggers
Marquez: Data health
Data graph
Dataset
Job
Lineage queries!
Dataset
Job
Lineage
Data Platform
Module: Triggers
● Timely processing of data
○ No polling!
● Reduce manual handling of
backfills
● Reduce production of bad
data
○ Incomplete data
○ Low-quality data
@wslulciuc
Marquez
Search
Health
Triggers
Marquez: Data triggers
Dataset
Job
Upstream failure
detection!
Job failure
Dataset
Job
Affected paths!
Job failure
Cascading triggers!
Dataset
Job
Trigger
Core concepts
Data Platform
Job + Datasets
Input
Dataset
Output
Dataset
Job
@wslulciucMarquez: Core concepts
Data Platform
Dataset versions!
@wslulciucMarquez: Core concepts
A dataset version
contains a
complete snapshot
of data as of some
point in time
v1 v1
v2 v2
v3
Job
Data Platform
Deltas “diffs”!
v1 v1
v2 v2
v3
Job
@wslulciucMarquez: Core concepts
INSERT INTO room_bookings (location, bookings)
VALUES (b648485, 2)
Data Platform
Deltas “diffs”!
v1 v1
v2 v2
v3
Job
@wslulciucMarquez: Core concepts
Δv2→v3
INSERT INTO room_bookings (location, bookings)
VALUES (b648485, 2)
Data Platform
Job versions!
@wslulciucMarquez: Core concepts
A job version is created
when business logic has
changed
v1 v1
v2 v2
v3
Job
v1
Job
v2
Data Platform
Job runs!
v1 v1
v2 v2
v3
Job
v1
@wslulciucMarquez: Core concepts
Job
Dataset
New Run
Job
v2
Data Platform
Job runs!
v1 v1
v2 v2
v3
Job
v1
@wslulciucMarquez: Core concepts
Dataset
New Run
v4
Job
Job
v2
Data Platform
Job runs!
v1 v1
v2 v2
v3
Job
v1
@wslulciucMarquez: Core concepts
Dataset
New Run
v4
Finish
Update
Job
Job
v2
Data Platform
Data triggers!
v1 v1
v2 v2
v3
Job
v1
@wslulciucMarquez: Core concepts
Dataset
New Run
v4
Trigger
Job
v7
Job
v10
Job
Update
Finish
Job
v2
Data Platform
Job failures!
v1 v1
v2 v2
v3
Job
v1
@wslulciucMarquez: Core concepts
Dataset
New Run FailureJob
v4
Job
v2
Data Platform
Delayed datasets!
v1 v1
v2 v2
v3
Job
v1
@wslulciucMarquez: Core concepts
Dataset
New RunJob
v4
Job
v2
Failure
Delay
Data Platform
Design benefits
@wslulciucMarquez: Core concepts
● Early upstream failure detection
● Debugging
○ What job version(s) produced /
consumed dataset version X?
● Recoverability
○ Full / incremental processing
● Coordination
Data model
Job
Marquez: Data model @wslulciuc
Dataset JobVersion
JobRunDatasetVersion
*
1
*
1
*
1
1*
1*
Marquez: Data model @wslulciuc
DbTable
Filesystem
Stream
Datasource
Types
Job
Dataset JobVersion
JobRunDatasetVersion
*
1
*
1
*
1
1*
1*
Metadata collection
Data Platform
@wslulciucMarquez: Metadata collection
How is metadata collected?
● Marquez API
● Language-specific SDKs
○ Java
○ Python
Marquez
Job
record
metadata
Data Platform
@wslulciucMarquez: Metadata collection
Workflow
Register
Job
● Job version
● Inputs / outputs
(logical names)
● Owner
● Description
Data Platform
@wslulciucMarquez: Metadata collection
Register
Job
● Job version
● Inputs / outputs
(logical names)
● Owner
● Description
Register
Job Run
Workflow
Data Platform
@wslulciucMarquez: Metadata collection
Register
Job
● Job version
● Inputs / outputs
(logical names)
● Owner
● Description
Register
Job Run
Start
● Update job
run state to
STARTED
Complete
● Update job
run state to
COMPLETED
Workflow
Data Platform
@wslulciucMarquez: Metadata collection
Register
Job
● Job version
● Inputs / outputs
(logical names)
● Owner
● Description
Register
Job Run
Start
● Update job
run state to
STARTED
Complete
● Update job
run state to
COMPLETED
Register
Job Run
Outputs
● Outputs (physical
locations)
Workflow
Room bookings pipeline
(take 2)
04
Data Platform
Example: Room bookings pipeline (take 2) @wslulciuc
Recall, we are tasked with analyzing
room booking trends …
Data Platform
Example: Room bookings pipeline (take 2) @wslulciuc
Job Postgres
Room Bookings
Workflow
Top Locations
S3
Scheduler
Recall, we are tasked with analyzing
room booking trends …
Data Platform
@wslulciuc
Problems
● What’s our job’s input dataset?
● Does the dataset have an owner?
● How often is the dataset updated?
● Coordinate changes
● Figure out backfills
Example: Room bookings pipeline (take 2)
Enter Marquez
@wslulciuc
room bo
Room Bookings (ALL)
All
created: feb. 15, 2010
Room Bookings (SF)
created: jul. 8, 2018
All room bookings since beginning of time
All San Francisco room bookings
Example: Room bookings pipeline (take 2)
Data Platform
S3
S3
@wslulciuc
room bo
All
Room Bookings (SF)
created: jul. 8, 2018All San Francisco room bookings
Example: Room bookings pipeline (take 2)
Well, that
was easy!
Room Bookings (ALL)
created: feb. 15, 2010All room bookings since beginning of time
Data Platform
S3
S3
@wslulciucExample: Room bookings pipeline (take 2)
Room Bookings (ALL)
created: feb. 15, 2010All room bookings since beginning of time
Owner: Data Engineering
Location: s3://room_bookings/raw/
Info
Schema: https://registry.wework.com/schemas/ids/1
Updated: Hourly
Data Platform
Description: All room bookings since beginning of time
S3
@wslulciucExample: Room bookings pipeline (take 2)
Room Bookings (ALL)
created: feb. 15, 2010All room bookings since beginning of time
Owner: Data Engineering
Location: s3://room_bookings/raw/
Info
Schema: https://registry.wework.com/schemas/ids/1
Updated: Hourly
Data Platform
Description: All room bookings since beginning of time
S3
@wslulciucExample: Room bookings pipeline (take 2)
Room Bookings (ALL)
created: feb. 15, 2010All room bookings since beginning of time
Owner: Data Engineering
Location: s3://room_bookings/raw/
Info
Schema: https://registry.wework.com/schemas/ids/1
Updated: Hourly
Data Platform
Description: All room bookings since beginning of time
Bonus!
S3
Data Platform
@wslulciuc
Problems
● What’s our job’s input dataset?
● Does the dataset have an owner?
● How often is the dataset updated?
● Coordinate changes
● Figure out backfills
Example: Room bookings pipeline (take 2)
Data Platform
@wslulciuc
Problems
● What’s our job’s input dataset?
● Does the dataset have an owner?
● How often is the dataset updated?
● Coordinate changes
● Figure out backfills
Example: Room bookings pipeline (take 2)
Data Platform
@wslulciuc
Problems
● What’s our job’s input dataset?
● Does the dataset have an owner?
● How often is the dataset updated?
● Coordinate changes
● Figure out backfills
Example: Room bookings pipeline (take 2)
Data Platform
@wslulciuc
Problems
● What’s our job’s input dataset?
● Does the dataset have an owner?
● How often is the dataset updated?
● Coordinate changes
● Figure out backfills
Example: Room bookings pipeline (take 2)
Data Platform
Example: Room bookings pipeline (take 2) @wslulciuc
Job Postgres
Room Bookings
Workflow
Top Locations
S3
We also had to coordinate changes to
our input data
Scheduler
Our view
Dataset
Job
Job failure
Room bookings
workflow
Global view!
Dataset
Job
Job failure
Room bookings
workflow
Top locations
dataset
@wslulciucExample: Room bookings pipeline (take 2)
Room Bookings (ALL)
created: feb. 15, 2010All room bookings since beginning of time
Owner: Data Engineering
Location: s3://room_bookings/raw/
Info
Schema: https://registry.wework.com/schemas/ids/2
Updated: Hourly
Data Platform
Description: All room bookings since beginning of time
Oh, version
bumped!
S3
Patch, deploy, trigger!
Dataset
Job
Room bookings
workflow
Top locations
dataset
Trigger
Data Platform
@wslulciuc
Problems
● What’s our job’s input dataset?
● Does the dataset have an owner?
● How often is the dataset updated?
● Coordinate changes
● Figure out backfills
Example: Room bookings pipeline (take 2)
Data Platform
@wslulciuc
Problems
● What’s our job’s input dataset?
● Does the dataset have an owner?
● How often is the dataset updated?
● Coordinate changes
● Figure out backfills
Example: Room bookings pipeline (take 2)
Data Platform
@wslulciuc
Problems
● What’s our job’s input dataset?
● Does the dataset have an owner?
● How often is the dataset updated?
● Coordinate changes
● Figure out backfills
Example: Room bookings pipeline (take 2)
Data Platform
@wslulciuc
RECAP
● Make it trival to discovery datasets
● Global context when debugging
● Easily handle backfills
○ Datasets as dependencies
Future work05
Data Platform
WeWork + Marquez
● Data platform built around Marquez
● Internal integrations
○ Scheduling
○ Batching
○ Streaming
@wslulciucMarquez: Future work
Data Platform
Roadmap
● Short-term
○ Release Marquez 0.1.0
○ Docs
● Long-term
○ Marquez UI
@wslulciucMarquez: Future work
github.com/MarquezProject
@MarquezProject
Thanks!
Data Platform DataEngConf NYC ‘18
Data Platform
We’re
hiring!
contact: willy.lulciuc@wework.com
Questions?
Data Platform DataEngConf NYC ‘18

More Related Content

What's hot

From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataDatabricks
 
Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?Yaroslav Tkachenko
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineeringNovita Sari
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations PresentationAdam Doyle
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkDatabricks
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021StreamNative
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino ProjectMartin Traverso
 
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Databricks
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakHakka Labs
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...Databricks
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionDatabricks
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Igor De Souza
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Sadayuki Furuhashi
 

What's hot (20)

From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineering
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
 
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for Production
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 

Similar to Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-based Triggers

ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!Timo Walther
 
MongoDB.local Austin 2018: Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch A...
MongoDB.local Austin 2018:  Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch A...MongoDB.local Austin 2018:  Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch A...
MongoDB.local Austin 2018: Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch A...MongoDB
 
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...Fabian Hueske
 
Keepin’ It Real(-Time) With Nadine Farah | Current 2022
Keepin’ It Real(-Time) With Nadine Farah | Current 2022Keepin’ It Real(-Time) With Nadine Farah | Current 2022
Keepin’ It Real(-Time) With Nadine Farah | Current 2022HostedbyConfluent
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessSATOSHI TAGOMORI
 
Streaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesStreaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesC4Media
 
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch FixData Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch FixStefan Krawczyk
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...Flink Forward
 
Workflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. TokyoWorkflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. TokyoTaro L. Saito
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
 
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022HostedbyConfluent
 
Architecting a next generation data platform
Architecting a next generation data platformArchitecting a next generation data platform
Architecting a next generation data platformhadooparchbook
 
Uncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony DavisUncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony DavisRed Gate Software
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseVictoriaMetrics
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Altinity Ltd
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingKent Graziano
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapItai Yaffe
 

Similar to Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-based Triggers (20)

ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
 
MongoDB.local Austin 2018: Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch A...
MongoDB.local Austin 2018:  Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch A...MongoDB.local Austin 2018:  Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch A...
MongoDB.local Austin 2018: Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch A...
 
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...
 
Keepin’ It Real(-Time) With Nadine Farah | Current 2022
Keepin’ It Real(-Time) With Nadine Farah | Current 2022Keepin’ It Real(-Time) With Nadine Farah | Current 2022
Keepin’ It Real(-Time) With Nadine Farah | Current 2022
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business
 
Streaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesStreaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+Tables
 
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch FixData Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch Fix
 
The Big Bad Data
The Big Bad DataThe Big Bad Data
The Big Bad Data
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
 
Workflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. TokyoWorkflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. Tokyo
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
 
Architecting a next generation data platform
Architecting a next generation data platformArchitecting a next generation data platform
Architecting a next generation data platform
 
Uncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony DavisUncovering SQL Server query problems with execution plans - Tony Davis
Uncovering SQL Server query problems with execution plans - Tony Davis
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
 
MicroStrategy at Badoo
MicroStrategy at BadooMicroStrategy at Badoo
MicroStrategy at Badoo
 
CDC to the Max!
CDC to the Max!CDC to the Max!
CDC to the Max!
 

Recently uploaded

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Recently uploaded (20)

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 

Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-based Triggers